Instructions

This file (hdat9600_final_assignment.Rmd) is the R Markdown document in which you need to complete your HDAT9600 final assignment. This assignment is assessed and will count for 30% of the total course marks. The assignment comprises two tasks worth 15 marks each. The first task will focus on logistic regression, and the second task will focus on survival analysis. There is no word limit, but a report of about 10 pages in length when printed (except that it will not be printed) is appropriate.

Don’t hesitate to ask the course convenor for help via OpenLearning. The course instructor are happy to point you in the right direction and to make suggestions, but they won’t, of course, complete your assignments for you!

Data for this assignment

The data used for this assignment consist of records from Intensive Care Unit (ICU) hospital stays in the USA. All patients were adults who were admitted for a wide variety of reasons. ICU stays of less than 48 hours have been excluded.

The source data for the assignment are data made freely available for the 2012 MIT PhysioNet/Computing for Cardiology Challenge. Details are provided here. Training Set A data have been used. The original data has been modified and assembled to suit the purpose of this assignment. While not required for the purposes of this assignment, full details of the preparatory work can be found in the hdat9600_final_assignment_data_preparation file.

The dataframe consists of 120 variables, which are defined as follows:

Patient Descriptor Variables

  • RecordID: a unique integer for each ICU stay
  • Age: years
  • Gender: male/female
  • Height: cm
  • ICUType: Coronary Care Unit; Cardiac Surgery Recovery Unit; Medical ICU; Surgical ICU
  • Length_of_stay: The number of days between the patient’s admission to the ICU and the end of hospitalisation
  • Survival: The number of days between ICU admission and death for patients who died
  • Outcome Variables

  • in_hospital_death: 0:survivor/1:died in-hospital this is the outcome variable for Task 1: Logistic Regression
  • Status: True/False this is the censoring variable for Task 2: Survival Analysis
  • Days: Length of survival (in days) this is the survival time variable for Task 2: Survival Analysis
  • Clinical Variables

    Use the hyperlinks below to find out more about the clinical meaning of each variable. The first two clinical variables are summary scores that are used to assess patient condition and risk.

  • SAPS-I score [Simplified Acute Physiological Score (Le Gall et al., 1984)]
  • SOFA score [Sequential Organ Failure Assessment (Ferreira et al., 2001)]
  • The following 36 clinical measures were assessed at multiple timepoints during each patient’s ICU stay. For each of the 36 clinical measures, you are given 3 summary variables: a) The minimum value during the first 24 hours in ICU (_min), b) The maximum value during the first 24 hours in ICU (_max), and c) The difference between the mean and the most extreme values during the first 24 hours in ICU (_diff). For example, for the clinical measure Cholesterol, these three variables are labelled ‘Cholesterol_min’, ‘Cholesterol_max’, and ‘Cholesterol_diff’.

  • Albumin (g/dL)
  • ALP [Alkaline phosphatase (IU/L)]
  • ALT [Alanine transaminase (IU/L)]
  • AST [Aspartate transaminase (IU/L)]
  • Bilirubin (mg/dL)
  • BUN [Blood urea nitrogen (mg/dL)]
  • Cholesterol (mg/dL)
  • Creatinine [Serum creatinine (mg/dL)]
  • DiasABP [Invasive diastolic arterial blood pressure (mmHg)]
  • FiO2 [Fractional inspired O2 (0-1)]
  • GCS [Glasgow Coma Score (3-15)]
  • Glucose [Serum glucose (mg/dL)]
  • HCO3 [Serum bicarbonate (mmol/L)]
  • HCT [Hematocrit (%)]
  • HR [Heart rate (bpm)]
  • K [Serum potassium (mEq/L)]
  • Lactate (mmol/L)
  • Mg [Serum magnesium (mmol/L)]
  • MAP [Invasive mean arterial blood pressure (mmHg)]
  • MechVent [Mechanical ventilation respiration (0:false, or 1:true)]
  • Na [Serum sodium (mEq/L)]
  • NIDiasABP [Non-invasive diastolic arterial blood pressure (mmHg)]
  • NIMAP [Non-invasive mean arterial blood pressure (mmHg)]
  • NISysABP [Non-invasive systolic arterial blood pressure (mmHg)]
  • PaCO2 [partial pressure of arterial CO2 (mmHg)]
  • PaO2 [Partial pressure of arterial O2 (mmHg)]
  • pH [Arterial pH (0-14)]
  • Platelets (cells/nL)
  • RespRate [Respiration rate (bpm)]
  • SaO2 [O2 saturation in hemoglobin (%)]
  • SysABP [Invasive systolic arterial blood pressure (mmHg)]
  • Temp [Temperature (°C)]
  • TropI [Troponin-I (μg/L)]
  • TropT [Troponin-T (μg/L)]
  • Urine [Urine output (mL)]
  • WBC [White blood cell count (cells/nL)]
  • Weight (kg)
  • Accessing the Data

    The data frame can be loaded with the following code:

    # Getting the path of your current open file
    # Extra code to ensure this file imports birth.csv in local directory for everyone
    library(rstudioapi)
    current_path <- rstudioapi::getActiveDocumentContext()$path 
    setwd(dirname(current_path ))
    
    icu_patients_df0 <- readRDS("icu_patients_df0.rds")
    icu_patients_df1 <- readRDS("icu_patients_df1.rds")

    Note: icu_patients_df1 is an imputed (i.e. missing values are ‘derived’) version of icu_patients_df0. This assignment does not concern the methods used for imputation.

    Task 1 (15 marks)

    In this task, you are required to develop a logistic regression model using the icu_patients_df1 data set which adequately explains or predicts the in_hospital_death variable as the outcome using a subset of the available predictor variables. You should fit a series of models, evaluating each one, before you present your final model. Your final model should not include all the predictor variables, just a small subset of them, which you have selected based on statistical significance and/or background knowledge. It is perfectly acceptable to include predictor variables in your final model which are not statistically significant, as long as you justify their inclusion on medical or physiological grounds (you will not be marked down if your medical justification is not exactly correct or complete, but do you best). Aim for between five and ten predictor variables (slightly more or fewer is OK). You should assess each model you consider for goodness of fit and other relevant statistics to help you choose between them. For your final model, present a set of diagnostic statistics and/or charts and comment on them. You don’t need to do an exhaustive exploratory data analysis of all the variables in the data set, but you should examine those variables that you use in your model. Finally, re-fit your final model to the unimputed data frame (icu_patients_df0.rds) and comment on any differences you find compared to the same model fitted to the imputed data.

    Hints

    1. Select an initial subset of explanatory variables that you will use to predict the risk of in-hospital death. Justify your choice.
    summary(icu_patients_df1)
    ##     RecordID      Length_of_stay       SAPS1            SOFA       
    ##  Min.   :132539   Min.   : -1.00   Min.   : 1.00   Min.   :-1.000  
    ##  1st Qu.:133875   1st Qu.:  6.00   1st Qu.:11.00   1st Qu.: 3.000  
    ##  Median :135146   Median : 10.00   Median :15.00   Median : 6.000  
    ##  Mean   :135156   Mean   : 13.74   Mean   :14.96   Mean   : 6.441  
    ##  3rd Qu.:136477   3rd Qu.: 17.00   3rd Qu.:19.00   3rd Qu.: 9.000  
    ##  Max.   :137740   Max.   :154.00   Max.   :34.00   Max.   :22.000  
    ##                                    NA's   :96                      
    ##     Survival      in_hospital_death      Days        Status       
    ##  Min.   :   0.0   Min.   :0.0000    Min.   :   0   Mode :logical  
    ##  1st Qu.:  10.0   1st Qu.:0.0000    1st Qu.: 265   FALSE:1288     
    ##  Median :  68.0   Median :0.0000    Median :2408   TRUE :773      
    ##  Mean   : 343.1   Mean   :0.1441    Mean   :1634                  
    ##  3rd Qu.: 420.0   3rd Qu.:0.0000    3rd Qu.:2408                  
    ##  Max.   :2408.0   Max.   :1.0000    Max.   :2408                  
    ##  NA's   :1288                                                     
    ##       Age         Albumin_diff      Albumin_max     Albumin_min   
    ##  Min.   :16.00   Min.   :0.01866   Min.   :1.100   Min.   :1.100  
    ##  1st Qu.:52.00   1st Qu.:0.28134   1st Qu.:2.600   1st Qu.:2.600  
    ##  Median :67.00   Median :0.48134   Median :3.000   Median :3.000  
    ##  Mean   :64.41   Mean   :0.56829   Mean   :3.045   Mean   :3.012  
    ##  3rd Qu.:78.00   3rd Qu.:0.81866   3rd Qu.:3.500   3rd Qu.:3.500  
    ##  Max.   :90.00   Max.   :2.31866   Max.   :5.300   Max.   :5.300  
    ##                                                                   
    ##     ALP_diff           ALP_max          ALP_min          ALT_diff        
    ##  Min.   :   0.148   Min.   :  19.0   Min.   :  19.0   Min.   :    0.446  
    ##  1st Qu.:  21.852   1st Qu.:  57.0   1st Qu.:  58.0   1st Qu.:   89.446  
    ##  Median :  37.852   Median :  78.0   Median :  76.0   Median :  102.446  
    ##  Mean   :  56.259   Mean   : 105.7   Mean   : 101.4   Mean   :  154.873  
    ##  3rd Qu.:  54.852   3rd Qu.: 110.0   3rd Qu.: 105.0   3rd Qu.:  108.446  
    ##  Max.   :1408.148   Max.   :1504.0   Max.   :1339.0   Max.   :10319.554  
    ##                                                                          
    ##     ALT_max           ALT_min          AST_diff            AST_max       
    ##  Min.   :    3.0   Min.   :   1.0   Min.   :    0.647   Min.   :    5.0  
    ##  1st Qu.:   17.0   1st Qu.:  17.0   1st Qu.:  123.353   1st Qu.:   27.0  
    ##  Median :   30.0   Median :  30.0   Median :  142.353   Median :   51.0  
    ##  Mean   :  118.3   Mean   :  90.1   Mean   :  227.991   Mean   :  188.1  
    ##  3rd Qu.:   69.0   3rd Qu.:  69.0   3rd Qu.:  152.353   3rd Qu.:  130.0  
    ##  Max.   :10440.0   Max.   :9240.0   Max.   :15870.647   Max.   :16040.0  
    ##                                                                          
    ##     AST_min       Bilirubin_diff     Bilirubin_max    Bilirubin_min   
    ##  Min.   :   5.0   Min.   : 0.03596   Min.   : 0.100   Min.   : 0.100  
    ##  1st Qu.:  24.0   1st Qu.: 1.06404   1st Qu.: 0.400   1st Qu.: 0.400  
    ##  Median :  42.0   Median : 1.36404   Median : 0.700   Median : 0.600  
    ##  Mean   : 116.4   Mean   : 1.97637   Mean   : 1.739   Mean   : 1.568  
    ##  3rd Qu.:  87.0   3rd Qu.: 1.46404   3rd Qu.: 1.300   3rd Qu.: 1.100  
    ##  Max.   :7960.0   Max.   :44.13596   Max.   :45.900   Max.   :45.500  
    ##                                                                       
    ##     BUN_diff           BUN_max          BUN_min       Cholesterol_diff  
    ##  Min.   :  0.4729   Min.   :  3.00   Min.   :  2.00   Min.   :  0.5772  
    ##  1st Qu.:  7.4729   1st Qu.: 14.00   1st Qu.: 12.00   1st Qu.: 17.5772  
    ##  Median : 11.5270   Median : 20.00   Median : 18.00   Median : 34.4228  
    ##  Mean   : 15.7904   Mean   : 27.48   Mean   : 24.44   Mean   : 37.2723  
    ##  3rd Qu.: 16.5270   3rd Qu.: 33.00   3rd Qu.: 29.00   3rd Qu.: 55.4228  
    ##  Max.   :172.4729   Max.   :197.00   Max.   :157.00   Max.   :173.5772  
    ##                                                                         
    ##  Cholesterol_max Cholesterol_min Creatinine_diff    Creatinine_max  
    ##  Min.   : 59.0   Min.   : 59     Min.   : 0.03245   Min.   : 0.200  
    ##  1st Qu.:122.0   1st Qu.:121     1st Qu.: 0.33245   1st Qu.: 0.800  
    ##  Median :152.0   Median :152     Median : 0.53245   Median : 1.000  
    ##  Mean   :153.4   Mean   :153     Mean   : 0.86298   Mean   : 1.499  
    ##  3rd Qu.:181.0   3rd Qu.:179     3rd Qu.: 0.73245   3rd Qu.: 1.500  
    ##  Max.   :330.0   Max.   :330     Max.   :20.76755   Max.   :22.000  
    ##                                                                     
    ##  Creatinine_min    DiasABP_diff       DiasABP_max      DiasABP_min    
    ##  Min.   : 0.200   Min.   :  0.5442   Min.   : 22.00   Min.   :  2.00  
    ##  1st Qu.: 0.700   1st Qu.: 16.5442   1st Qu.: 68.00   1st Qu.: 40.00  
    ##  Median : 0.900   Median : 21.5442   Median : 77.00   Median : 46.00  
    ##  Mean   : 1.319   Mean   : 24.5299   Mean   : 78.24   Mean   : 46.56  
    ##  3rd Qu.: 1.300   3rd Qu.: 28.4558   3rd Qu.: 86.00   3rd Qu.: 52.00  
    ##  Max.   :14.100   Max.   :209.4558   Max.   :268.00   Max.   :258.00  
    ##                   NA's   :715        NA's   :715      NA's   :715     
    ##    FiO2_diff          FiO2_max         FiO2_min         GCS_diff    
    ##  Min.   :0.00192   Min.   :0.2800   Min.   :0.2800   Min.   :0.244  
    ##  1st Qu.:0.15192   1st Qu.:0.5000   1st Qu.:0.4000   1st Qu.:3.756  
    ##  Median :0.44808   Median :1.0000   Median :0.4000   Median :3.756  
    ##  Mean   :0.31376   Mean   :0.7874   Mean   :0.4863   Mean   :5.183  
    ##  3rd Qu.:0.44808   3rd Qu.:1.0000   3rd Qu.:0.5000   3rd Qu.:8.244  
    ##  Max.   :0.44808   Max.   :1.0000   Max.   :1.0000   Max.   :8.244  
    ##                                                                     
    ##     GCS_max         GCS_min          Gender      Glucose_diff      
    ##  Min.   : 3.00   Min.   : 3.000   Female: 913   Min.   :   0.1445  
    ##  1st Qu.:11.00   1st Qu.: 3.000   Male  :1148   1st Qu.:  23.8555  
    ##  Median :15.00   Median : 8.000                 Median :  39.1445  
    ##  Mean   :12.87   Mean   : 8.773                 Mean   :  57.0844  
    ##  3rd Qu.:15.00   3rd Qu.:14.000                 3rd Qu.:  61.8555  
    ##  Max.   :15.00   Max.   :15.000                 Max.   :1003.1445  
    ##                                                                    
    ##   Glucose_max      Glucose_min      HCO3_diff          HCO3_max    
    ##  Min.   :  39.0   Min.   : 24.0   Min.   : 0.2275   Min.   : 9.00  
    ##  1st Qu.: 117.0   1st Qu.: 98.0   1st Qu.: 1.7725   1st Qu.:22.00  
    ##  Median : 141.0   Median :117.0   Median : 3.2275   Median :24.00  
    ##  Mean   : 163.3   Mean   :124.8   Mean   : 4.1506   Mean   :24.27  
    ##  3rd Qu.: 180.0   3rd Qu.:141.0   3rd Qu.: 5.2275   3rd Qu.:27.00  
    ##  Max.   :1143.0   Max.   :632.0   Max.   :24.2275   Max.   :47.00  
    ##                                                                    
    ##     HCO3_min        HCT_diff           HCT_max         HCT_min     
    ##  Min.   : 5.00   Min.   : 0.06013   Min.   :21.20   Min.   : 9.00  
    ##  1st Qu.:20.00   1st Qu.: 2.96013   1st Qu.:30.00   1st Qu.:26.20  
    ##  Median :23.00   Median : 5.16013   Median :33.10   Median :29.60  
    ##  Mean   :22.43   Mean   : 5.70366   Mean   :33.57   Mean   :30.08  
    ##  3rd Qu.:25.00   3rd Qu.: 7.66013   3rd Qu.:36.70   3rd Qu.:33.70  
    ##  Max.   :44.00   Max.   :23.43987   Max.   :54.40   Max.   :50.60  
    ##                                                                    
    ##      Height         HR_diff             HR_max          HR_min      
    ##  Min.   : 13.0   Min.   :  0.9221   Min.   : 44.0   Min.   :  0.00  
    ##  1st Qu.:162.6   1st Qu.: 20.0779   1st Qu.: 91.0   1st Qu.: 61.00  
    ##  Median :170.2   Median : 27.0779   Median :104.0   Median : 71.00  
    ##  Mean   :170.0   Mean   : 30.4294   Mean   :106.6   Mean   : 71.99  
    ##  3rd Qu.:177.8   3rd Qu.: 36.9221   3rd Qu.:119.0   3rd Qu.: 81.00  
    ##  Max.   :426.7   Max.   :212.9221   Max.   :300.0   Max.   :126.00  
    ##  NA's   :992                                                        
    ##                           ICUType        K_diff             K_max       
    ##  Coronary Care Unit           :297   Min.   : 0.03521   Min.   : 2.500  
    ##  Cardiac Surgery Recovery Unit:448   1st Qu.: 0.33521   1st Qu.: 4.000  
    ##  Medical ICU                  :788   Median : 0.56479   Median : 4.300  
    ##  Surgical ICU                 :528   Mean   : 0.69010   Mean   : 4.419  
    ##                                      3rd Qu.: 0.86479   3rd Qu.: 4.700  
    ##                                      Max.   :18.76479   Max.   :22.900  
    ##                                                                         
    ##      K_min       Lactate_diff        Lactate_max      Lactate_min    
    ##  Min.   :1.80   Min.   : 0.003596   Min.   : 0.400   Min.   : 0.300  
    ##  1st Qu.:3.50   1st Qu.: 1.096404   1st Qu.: 1.500   1st Qu.: 1.200  
    ##  Median :3.90   Median : 1.503596   Median : 2.200   Median : 1.600  
    ##  Mean   :3.95   Mean   : 1.753380   Mean   : 2.773   Mean   : 1.899  
    ##  3rd Qu.:4.30   3rd Qu.: 1.896404   3rd Qu.: 3.200   3rd Qu.: 2.200  
    ##  Max.   :6.90   Max.   :26.503596   Max.   :29.300   Max.   :24.200  
    ##                                                                      
    ##     MAP_diff           MAP_max         MAP_min          Mg_diff      
    ##  Min.   :  0.2316   Min.   :  4.0   Min.   :  1.00   Min.   :0.0157  
    ##  1st Qu.: 21.7684   1st Qu.: 94.0   1st Qu.: 55.00   1st Qu.:0.1843  
    ##  Median : 29.2316   Median :104.0   Median : 61.00   Median :0.3157  
    ##  Mean   : 38.4735   Mean   :111.8   Mean   : 62.76   Mean   :0.4181  
    ##  3rd Qu.: 41.2316   3rd Qu.:117.0   3rd Qu.: 70.00   3rd Qu.:0.5843  
    ##  Max.   :213.2316   Max.   :291.0   Max.   :265.00   Max.   :7.9157  
    ##                                                                      
    ##      Mg_max          Mg_min         Na_diff            Na_max     
    ##  Min.   :1.100   Min.   :0.600   Min.   : 0.2066   Min.   :112.0  
    ##  1st Qu.:1.900   1st Qu.:1.600   1st Qu.: 1.7934   1st Qu.:137.0  
    ##  Median :2.100   Median :1.800   Median : 3.2066   Median :140.0  
    ##  Mean   :2.153   Mean   :1.857   Mean   : 4.1146   Mean   :139.8  
    ##  3rd Qu.:2.400   3rd Qu.:2.100   3rd Qu.: 5.2066   3rd Qu.:142.0  
    ##  Max.   :9.900   Max.   :6.200   Max.   :41.2066   Max.   :177.0  
    ##                                                                   
    ##      Na_min    NIDiasABP_diff    NIDiasABP_max    NIDiasABP_min  
    ##  Min.   : 98   Min.   :  0.491   Min.   : 29.00   Min.   :10.00  
    ##  1st Qu.:136   1st Qu.: 17.509   1st Qu.: 64.00   1st Qu.:33.00  
    ##  Median :138   Median : 25.500   Median : 76.00   Median :42.00  
    ##  Mean   :138   Mean   : 26.964   Mean   : 76.92   Mean   :43.17  
    ##  3rd Qu.:141   3rd Qu.: 33.509   3rd Qu.: 89.00   3rd Qu.:52.00  
    ##  Max.   :160   Max.   :116.509   Max.   :174.00   Max.   :97.00  
    ##                NA's   :455       NA's   :455      NA's   :455    
    ##    NIMAP_diff         NIMAP_max        NIMAP_min      NISysABP_diff     
    ##  Min.   :  0.0407   Min.   : 47.33   Min.   :  7.00   Min.   :  0.3013  
    ##  1st Qu.: 18.2893   1st Qu.: 81.08   1st Qu.: 52.33   1st Qu.: 25.6987  
    ##  Median : 24.7107   Median : 93.67   Median : 60.00   Median : 34.3013  
    ##  Mean   : 26.9759   Mean   : 94.47   Mean   : 61.69   Mean   : 37.7962  
    ##  3rd Qu.: 33.2893   3rd Qu.:106.00   3rd Qu.: 70.00   3rd Qu.: 45.6987  
    ##  Max.   :113.2893   Max.   :189.00   Max.   :121.00   Max.   :157.3013  
    ##  NA's   :455        NA's   :455      NA's   :455      NA's   :453       
    ##   NISysABP_max    NISysABP_min      PaCO2_diff        PaCO2_max    
    ##  Min.   : 78.0   Min.   :  4.00   Min.   : 0.3358   Min.   :16.00  
    ##  1st Qu.:121.0   1st Qu.: 83.00   1st Qu.: 5.6642   1st Qu.:39.00  
    ##  Median :138.0   Median : 95.00   Median : 8.6642   Median :44.00  
    ##  Mean   :140.5   Mean   : 96.55   Mean   :10.7463   Mean   :45.56  
    ##  3rd Qu.:156.0   3rd Qu.:108.00   3rd Qu.:13.3358   3rd Qu.:50.00  
    ##  Max.   :274.0   Max.   :234.00   Max.   :57.6642   Max.   :98.00  
    ##  NA's   :453     NA's   :453                                       
    ##    PaCO2_min       PaO2_diff           PaO2_max        PaO2_min    
    ##  Min.   : 0.30   Min.   :  0.6179   Min.   : 27.0   Min.   : 20.0  
    ##  1st Qu.:32.00   1st Qu.: 67.6179   1st Qu.:123.0   1st Qu.: 74.0  
    ##  Median :36.00   Median : 90.6179   Median :191.0   Median : 92.0  
    ##  Mean   :36.72   Mean   :119.5407   Mean   :223.5   Mean   :105.8  
    ##  3rd Qu.:40.00   3rd Qu.:154.3821   3rd Qu.:311.0   3rd Qu.:122.0  
    ##  Max.   :93.00   Max.   :341.3821   Max.   :500.0   Max.   :477.0  
    ##                                                                    
    ##     pH_diff             pH_max          pH_min      Platelets_diff    
    ##  Min.   :0.000114   Min.   :7.150   Min.   :3.000   Min.   :  0.2307  
    ##  1st Qu.:0.059886   1st Qu.:7.380   1st Qu.:7.280   1st Qu.: 39.7693  
    ##  Median :0.089886   Median :7.420   Median :7.340   Median : 72.7693  
    ##  Mean   :0.098486   Mean   :7.418   Mean   :7.327   Mean   : 92.5348  
    ##  3rd Qu.:0.120114   3rd Qu.:7.460   3rd Qu.:7.390   3rd Qu.:116.7693  
    ##  Max.   :4.369886   Max.   :7.690   Max.   :7.630   Max.   :857.2307  
    ##                                                                       
    ##  Platelets_max    Platelets_min   RespRate_diff      RespRate_max  
    ##  Min.   :  18.0   Min.   :  9.0   Min.   : 0.6514   Min.   :13.00  
    ##  1st Qu.: 157.0   1st Qu.:126.0   1st Qu.: 7.3486   1st Qu.:24.00  
    ##  Median : 210.0   Median :184.0   Median : 9.6514   Median :27.00  
    ##  Mean   : 228.9   Mean   :197.9   Mean   :11.6075   Mean   :29.12  
    ##  3rd Qu.: 275.0   3rd Qu.:246.0   3rd Qu.:13.6514   3rd Qu.:33.00  
    ##  Max.   :1047.0   Max.   :891.0   Max.   :78.6514   Max.   :98.00  
    ##                                                                    
    ##   RespRate_min     SaO2_diff          SaO2_max         SaO2_min     
    ##  Min.   : 4.00   Min.   : 0.2461   Min.   : 75.00   Min.   : 33.00  
    ##  1st Qu.:12.00   1st Qu.: 0.7539   1st Qu.: 97.00   1st Qu.: 95.00  
    ##  Median :14.00   Median : 1.7539   Median : 98.00   Median : 97.00  
    ##  Mean   :14.25   Mean   : 2.5635   Mean   : 97.44   Mean   : 95.85  
    ##  3rd Qu.:17.00   3rd Qu.: 3.2461   3rd Qu.: 99.00   3rd Qu.: 98.00  
    ##  Max.   :24.00   Max.   :64.2461   Max.   :100.00   Max.   :100.00  
    ##                                                                     
    ##   SysABP_diff        SysABP_max      SysABP_min       Temp_diff      
    ##  Min.   :  3.689   Min.   : 52.0   Min.   : 11.00   Min.   : 0.1259  
    ##  1st Qu.: 32.310   1st Qu.:135.0   1st Qu.: 79.00   1st Qu.: 0.8741  
    ##  Median : 40.690   Median :149.0   Median : 88.00   Median : 1.2741  
    ##  Mean   : 45.008   Mean   :152.1   Mean   : 90.91   Mean   : 1.3756  
    ##  3rd Qu.: 53.690   3rd Qu.:167.0   3rd Qu.:102.00   3rd Qu.: 1.7259  
    ##  Max.   :178.690   Max.   :295.0   Max.   :262.00   Max.   :12.7741  
    ##  NA's   :715       NA's   :715     NA's   :715                       
    ##     Temp_max        Temp_min     TroponinI_diff    TroponinI_max  
    ##  Min.   :35.40   Min.   :24.20   Min.   : 0.1571   Min.   : 0.30  
    ##  1st Qu.:37.10   1st Qu.:35.60   1st Qu.: 4.6429   1st Qu.: 2.60  
    ##  Median :37.60   Median :36.10   Median : 5.2571   Median : 7.80  
    ##  Mean   :37.69   Mean   :36.01   Mean   :10.1737   Mean   :11.83  
    ##  3rd Qu.:38.20   3rd Qu.:36.60   3rd Qu.:12.1571   3rd Qu.:17.60  
    ##  Max.   :42.10   Max.   :38.30   Max.   :37.9571   Max.   :43.40  
    ##                                                                   
    ##  TroponinI_min   TroponinT_diff    TroponinT_max     TroponinT_min    
    ##  Min.   : 0.30   Min.   : 0.0215   Min.   : 0.0100   Min.   : 0.0100  
    ##  1st Qu.: 1.30   1st Qu.: 0.5785   1st Qu.: 0.0600   1st Qu.: 0.0400  
    ##  Median : 6.80   Median : 0.6285   Median : 0.1700   Median : 0.1200  
    ##  Mean   :10.06   Mean   : 1.0920   Mean   : 0.9079   Mean   : 0.6347  
    ##  3rd Qu.:13.20   3rd Qu.: 0.6585   3rd Qu.: 0.8000   3rd Qu.: 0.4700  
    ##  Max.   :42.90   Max.   :23.7915   Max.   :24.4600   Max.   :22.9300  
    ##                                                                       
    ##    Urine_diff        Urine_max        Urine_min         WBC_diff        
    ##  Min.   :  19.22   Min.   :   0.0   Min.   :  0.00   Min.   :  0.03315  
    ##  1st Qu.: 100.78   1st Qu.: 200.0   1st Qu.:  0.00   1st Qu.:  2.63315  
    ##  Median : 300.78   Median : 400.0   Median : 20.00   Median :  4.53315  
    ##  Mean   : 438.25   Mean   : 521.8   Mean   : 34.55   Mean   :  5.82079  
    ##  3rd Qu.: 525.78   3rd Qu.: 625.0   3rd Qu.: 36.00   3rd Qu.:  7.23315  
    ##  Max.   :4900.78   Max.   :5000.0   Max.   :600.00   Max.   :143.46685  
    ##                                                                         
    ##     WBC_max          WBC_min        Weight_diff          Weight_max    
    ##  Min.   :  0.10   Min.   :  0.10   Min.   :  0.00012   Min.   : 34.60  
    ##  1st Qu.:  9.30   1st Qu.:  7.60   1st Qu.:  7.60000   1st Qu.: 66.00  
    ##  Median : 12.30   Median : 10.40   Median : 14.70012   Median : 80.00  
    ##  Mean   : 13.95   Mean   : 11.51   Mean   : 18.17040   Mean   : 82.66  
    ##  3rd Qu.: 16.90   3rd Qu.: 14.10   3rd Qu.: 24.80000   3rd Qu.: 94.55  
    ##  Max.   :155.60   Max.   :128.30   Max.   :149.30012   Max.   :230.00  
    ##                                    NA's   :146         NA's   :146     
    ##    Weight_min    
    ##  Min.   : 34.60  
    ##  1st Qu.: 65.00  
    ##  Median : 77.70  
    ##  Mean   : 80.86  
    ##  3rd Qu.: 91.95  
    ##  Max.   :230.00  
    ##  NA's   :146
    head(icu_patients_df1)
    ##   RecordID Length_of_stay SAPS1 SOFA Survival in_hospital_death Days Status Age
    ## 1   132539              5     6    1       NA                 0 2408  FALSE  54
    ## 2   132540              8    16    8       NA                 0 2408  FALSE  76
    ## 3   132541             19    21   11       NA                 0 2408  FALSE  44
    ## 4   132543              9     7    1      575                 0  575   TRUE  68
    ## 5   132545              4    17    2      918                 0  918   TRUE  88
    ## 6   132547              6    14   11     1637                 0 1637   TRUE  64
    ##   Albumin_diff Albumin_max Albumin_min   ALP_diff ALP_max ALP_min  ALT_diff
    ## 1    0.2186633         3.2         3.1 118.147964     214     202  80.44617
    ## 2    0.8813367         2.1         2.2 252.147964     338     348  94.44617
    ## 3    0.6813367         2.7         2.3  31.147964     127     105  45.44617
    ## 4    1.4186633         4.4         4.4   9.147964     105     105 108.44617
    ## 5    0.3813367         2.7         2.6  56.852036      39      78  96.44617
    ## 6    0.4186633         3.4         3.3   5.147964     101     101  75.44617
    ##   ALT_max ALT_min  AST_diff AST_max AST_min Bilirubin_diff Bilirubin_max
    ## 1      40      75 131.35271      38      53       1.464039           0.4
    ## 2     206      26 116.35271      53      74       1.564039           1.2
    ## 3      91      75  65.64729     235     164       1.235961           3.0
    ## 4      12      12 154.35271      15      15       1.564039           0.2
    ## 5      24      32 154.35271      15      97       1.364039           0.4
    ## 6      60      45 122.35271     162      47       1.364039           0.4
    ##   Bilirubin_min  BUN_diff BUN_max BUN_min Cholesterol_diff Cholesterol_max
    ## 1           0.3 11.527053      13      13         16.42276             154
    ## 2           0.2  8.527053      18      16         28.42276             139
    ## 3           2.8 21.527053       8       3         56.42276             111
    ## 4           0.2  4.527053      23      20         37.42276             127
    ## 5           0.9 20.472947      45      45         55.42276             104
    ## 6           0.4  9.527053      19      15         55.57724             212
    ##   Cholesterol_min Creatinine_diff Creatinine_max Creatinine_min DiasABP_diff
    ## 1             140       0.4324463            0.8            0.8           NA
    ## 2             128       0.4324463            1.2            0.8     26.54421
    ## 3             100       0.9324463            0.4            0.3           NA
    ## 4             119       0.5324463            0.9            0.7           NA
    ## 5             101       0.2324463            1.0            1.0           NA
    ## 6             212       0.3324463            1.4            0.9     20.45579
    ##   DiasABP_max DiasABP_min  FiO2_diff FiO2_max FiO2_min GCS_diff GCS_max GCS_min
    ## 1          NA          NA 0.05192012      0.5      0.5 3.755971      15      15
    ## 2          81          32 0.44807988      1.0      0.4 8.244029      15       3
    ## 3          NA          NA 0.44807988      1.0      0.5 6.244029       8       5
    ## 4          NA          NA 0.44807988      1.0      0.4 3.755971      15      14
    ## 5          NA          NA 0.15192012      0.4      0.5 3.755971      15      15
    ## 6          79          55 0.05192012      0.5      0.5 4.244029       9       7
    ##   Gender Glucose_diff Glucose_max Glucose_min HCO3_diff HCO3_max HCO3_min
    ## 1 Female     65.14446         205         205  3.227452       26       26
    ## 2   Male     34.85554         105         105  1.772548       22       21
    ## 3 Female     20.85554         141         119  3.227452       26       24
    ## 4   Male     33.85554         129         106  5.227452       28       27
    ## 5 Female     26.85554         113         113  4.772548       18       18
    ## 6   Male    124.14446         264         197  3.772548       19       19
    ##    HCT_diff HCT_max HCT_min Height   HR_diff HR_max HR_min
    ## 1  2.739871    33.7    33.5     NA 29.077891     80     58
    ## 2  6.260129    29.7    24.7  175.3  7.077891     88     80
    ## 3  4.260129    28.5    26.7     NA 30.077891    113     57
    ## 4 10.339871    41.3    36.1  180.3 30.077891     88     57
    ## 5  8.360129    30.8    22.6     NA 20.077891     94     67
    ## 6 10.639871    41.6    36.8  180.3 16.077891     91     71
    ##                         ICUType    K_diff K_max K_min Lactate_diff Lactate_max
    ## 1                  Surgical ICU 0.2647934   4.4   4.4    0.9964037         1.9
    ## 2 Cardiac Surgery Recovery Unit 0.1647934   4.3   4.3    1.4964037         2.9
    ## 3                   Medical ICU 4.4647934   8.6   3.3    1.4964037         1.9
    ## 4                   Medical ICU 0.1352066   4.2   4.0    1.5964037         1.2
    ## 5                   Medical ICU 1.8647934   6.0   3.8    0.8964037         2.0
    ## 6            Coronary Care Unit 0.9647934   5.1   3.8    1.8964037         0.9
    ##   Lactate_min MAP_diff MAP_max MAP_min   Mg_diff Mg_max Mg_min   Na_diff Na_max
    ## 1         1.8 31.23164     109      56 0.4842982    1.5    1.5 2.2066071    137
    ## 2         1.3 34.76836     100      43 1.1157018    3.1    1.9 0.2066071    139
    ## 3         1.3 53.23164     131      71 0.6842982    1.9    1.3 2.2066071    140
    ## 4         1.5 24.23164     102      72 0.1157018    2.1    2.1 1.7933929    141
    ## 5         1.9  9.76836      78      68 0.4842982    1.5    1.5 0.7933929    140
    ## 6         1.3 24.23164     102      62 0.2842982    1.7    1.7 2.2066071    141
    ##   Na_min NIDiasABP_diff NIDiasABP_max NIDiasABP_min NIMAP_diff NIMAP_max
    ## 1    137       17.49101            65            40   17.04069     92.33
    ## 2    139       19.49101            65            38   26.38069     86.33
    ## 3    137       37.50899            95            66   34.28931    110.00
    ## 4    140       23.50899            81            54   24.98931    100.70
    ## 5    140       38.50899            96            29   29.98931    105.70
    ## 6    137       31.50899            89            52   26.58931    102.30
    ##   NIMAP_min NISysABP_diff NISysABP_max NISysABP_min PaCO2_diff PaCO2_max
    ## 1     58.67      40.30125          157           96   3.335797        37
    ## 2     49.33      44.69875          129           72   7.335797        41
    ## 3     83.33      33.30125          150          111   3.335797        37
    ## 4     73.00      23.30125          140          102   9.335797        38
    ## 5     63.67      39.30125          156          119   6.335797        34
    ## 6     61.67      35.69875          129           81   5.335797        45
    ##   PaCO2_min PaO2_diff PaO2_max PaO2_min    pH_diff pH_max pH_min Platelets_diff
    ## 1        38  47.61789      186      111 0.12011376   7.49   7.43       31.23069
    ## 2        33 286.38211      445       89 0.08011376   7.45   7.34       36.23069
    ## 3        37  93.61789       65       65 0.14011376   7.51   7.51      117.76931
    ## 4        31  94.61789      148       64 0.14011376   7.51   7.47      201.23069
    ## 5        35  80.61789       78       84 0.04011376   7.38   7.41       80.76931
    ## 6        35  80.61789      101       78 0.07988624   7.40   7.29       86.23069
    ##   Platelets_max Platelets_min RespRate_diff RespRate_max RespRate_min SaO2_diff
    ## 1           221           221       7.34858           24           12  3.246079
    ## 2           226           164      16.65142           36           11  1.753921
    ## 3            84            72      13.65142           33           18  2.246079
    ## 4           391           315       7.34858           21           12  1.753921
    ## 5           109           109       6.65142           26           15  3.246079
    ## 6           276           219      27.65142           47           20  1.246079
    ##   SaO2_max SaO2_min SysABP_diff SysABP_max SysABP_min Temp_diff Temp_max
    ## 1       98       94          NA         NA         NA  1.874083     38.1
    ## 2       99       97     50.3105        135         66  2.474083     37.9
    ## 3       95       95          NA         NA         NA  2.025917     39.0
    ## 4       99       97          NA         NA         NA  1.874083     36.7
    ## 5       97       94          NA         NA         NA  1.174083     37.8
    ## 6       97       96     43.3105        152         73  1.174083     37.8
    ##   Temp_min TroponinI_diff TroponinI_max TroponinI_min TroponinT_diff
    ## 1     35.1      5.1429448           1.0           0.3      0.4785006
    ## 2     34.5     26.2570552          31.7          16.1      0.6485006
    ## 3     36.7     31.2570552          33.4          36.7      0.8814994
    ## 4     35.1      0.8570552           5.9           6.3      0.6485006
    ## 5     35.8      0.1570552           5.6           5.6      0.6085006
    ## 6     35.8      4.1429448           1.3           1.3      0.6385006
    ##   TroponinT_max TroponinT_min Urine_diff Urine_max Urine_min   WBC_diff WBC_max
    ## 1          0.58          0.19  800.78242       900        30  0.9331524    11.2
    ## 2          0.43          0.02  670.78242       770         0  4.7331524    13.1
    ## 3          1.55          1.41  310.78242       410        30  8.4331524     4.2
    ## 4          0.10          0.02  600.78242       700       100  3.3331524    11.5
    ## 5          0.06          0.37   83.21758       150        16  8.3331524     3.8
    ## 6          0.03          0.10 1100.78242      1200        40 11.8668476    24.0
    ##   WBC_min Weight_diff Weight_max Weight_min
    ## 1    11.2          NA         NA         NA
    ## 2     7.4    4.699878       80.6       76.0
    ## 3     3.7   23.999878       56.7       56.7
    ## 4     8.8    3.900122       84.6       84.6
    ## 5     3.8          NA         NA         NA
    ## 6    14.4   33.300122      114.0      114.0
    # Initial variables used to predict in-hospital death
    # Based on background knowledge
    
    # Albumin_min : low albumin assoc with malnutrition 
    # Bilirubin_max: may indicate liver failure and also is included in SOFA scoring
    # BUN_max : high urea assoc with renal impairment
    # Creatinine_max : high creatinine assoc with renal impairment
    # FiO2_max : high oxygen requirements indicate lung pathology
    # GCS_min : low GCS indicates head pathology
    # Glucose_min : both hypo and hyperglycaemia can contribute to mortality
    # Glucose_max
    # HCT_min : HCT or haemoglobin assoc with mortality
    # HR_min : too high or too low HR can cause cardiac issues
    # HR_max
    # K_min : both hypo and hyperkalaemia can indicate pathology
    # K_max
    # Lactate_max : high lactate is a marker of inadequate organ perfusion
    # Mg_min : low magnesium could result in arrhythmias
    # MAP_min : hypoperfusion leads to morbidity
    # MAP_max : hyperperfusion / dysregulation of circulation could lead to morbidity
    # MAP_diff
    # MechVent : mech vent associated with mortality
    # Na_min : both hypo and hypernatremia can lead to brain oedema / damage / seizures
    # Na_max
    # PaCO2_min 
    # PaCO2_max : hypercapnia can result from ventilatory failure
    # PaO2_min : because oxygen is important
    # pH_min : both acidaemia and alkaemia indicates systemic or renal pathology
    # pH_max
    # Platelets_min : low platelets indicate higher risk of bleeding
    # RespRate_max : high RR is related to mortality
    # SaO2_min : because oxygen is important
    # Temp_min : hypothermia / hyperthermia may lead to morbiditiy
    # Temp_max
    # TropT_max : indicates myocardial injury
    # Urine_min : low urine output indicates renal impairment
    # WBC_max : presence of infection
    # Weight : whether obesity has any contributing factors
    
    # Non clinical variables:
    # age
    # gender
    # height (and therefore BMI)
    # ICUType
    # Length_of_stay
    # SAPS1
    # ICUType
    
    # Variables I am unsure of clinical significant:
    # ALP / ALT / AST
    # HCO3
    1. Conduct basic exploratory data analysis on your variables of choice.
    # add in BMI variable
    # icu_patients_df1$BMI <- (icu_patients_df1$Weight_min) / (0.01*icu_patients_df1$Height)^2
    # summary(icu_patients_df1$Weight_min)
    # summary(icu_patients_df1$Height) #max height is 426.7cm!?
    # summary(icu_patients_df1$BMI)
    # BMI actually gives no meaningful data - remove this code!
    
    # Basic EDA for each variable
    library(ggplot2)
    table(icu_patients_df1$in_hospital_death) #297 deaths out of 2016 observations = 14.4%
    ## 
    ##    0    1 
    ## 1764  297
    # Patients who died had lower albumin
    ggplot(data=icu_patients_df1, mapping = aes(x = in_hospital_death=="1", y = Albumin_min))+ geom_boxplot()

    # Patients who died had higher bilirubin
    ggplot(data=icu_patients_df1, mapping = aes(x = in_hospital_death=="1", y = Bilirubin_max))+ geom_boxplot()

    # Patients who died had higher urea
    ggplot(data=icu_patients_df1, mapping = aes(x = in_hospital_death=="1", y = BUN_max))+ geom_boxplot()

    # Patients who died had higher creatinine
    ggplot(data=icu_patients_df1, mapping = aes(x = in_hospital_death=="1", y = Creatinine_max))+ geom_boxplot()

    # No difference
    ggplot(data=icu_patients_df1, mapping = aes(x = in_hospital_death=="1", y = FiO2_max))+ geom_boxplot()

    # Patients who died had lower GCS
    ggplot(data=icu_patients_df1, mapping = aes(x = in_hospital_death=="1", y = GCS_min))+ geom_boxplot()

    # Little to no difference
    ggplot(data=icu_patients_df1, mapping = aes(x = in_hospital_death=="1", y = Glucose_min))+ geom_boxplot()

    ggplot(data=icu_patients_df1, mapping = aes(x = in_hospital_death=="1", y = Glucose_max))+ geom_boxplot()

    # Patients who died had slightly lower HCT
    ggplot(data=icu_patients_df1, mapping = aes(x = in_hospital_death=="1", y = HCT_min))+ geom_boxplot()

    # Patients who died had slightly higher HR_min and HR_max
    ggplot(data=icu_patients_df1, mapping = aes(x = in_hospital_death=="1", y = HR_min))+ geom_boxplot()

    ggplot(data=icu_patients_df1, mapping = aes(x = in_hospital_death=="1", y = HR_max))+ geom_boxplot()

    # Little to no difference
    ggplot(data=icu_patients_df1, mapping = aes(x = in_hospital_death=="1", y = K_min))+ geom_boxplot()

    ggplot(data=icu_patients_df1, mapping = aes(x = in_hospital_death=="1", y = K_max))+ geom_boxplot()

    # Patients who died had slightly higher lactate
    ggplot(data=icu_patients_df1, mapping = aes(x = in_hospital_death=="1", y = Lactate_max))+ geom_boxplot()

    # No difference
    ggplot(data=icu_patients_df1, mapping = aes(x = in_hospital_death=="1", y = Mg_min))+ geom_boxplot()

    # No difference
    ggplot(data=icu_patients_df1, mapping = aes(x = in_hospital_death=="1", y = MAP_min))+ geom_boxplot()

    ggplot(data=icu_patients_df1, mapping = aes(x = in_hospital_death=="1", y = MAP_max))+ geom_boxplot()

    ggplot(data=icu_patients_df1, mapping = aes(x = in_hospital_death=="1", y = MAP_diff))+ geom_boxplot()

    # MechVent row was deleted from the data
    # ggplot(data=icu_patients_df1, mapping = aes(x = in_hospital_death=="1", y = MechVent))+ geom_boxplot()
    
    # No difference
    ggplot(data=icu_patients_df1, mapping = aes(x = in_hospital_death=="1", y = Na_min))+ geom_boxplot()

    ggplot(data=icu_patients_df1, mapping = aes(x = in_hospital_death=="1", y = Na_max))+ geom_boxplot()

    # No difference
    ggplot(data=icu_patients_df1, mapping = aes(x = in_hospital_death=="1", y = PaCO2_min))+ geom_boxplot()

    ggplot(data=icu_patients_df1, mapping = aes(x = in_hospital_death=="1", y = PaCO2_max))+ geom_boxplot()

    # No difference
    ggplot(data=icu_patients_df1, mapping = aes(x = in_hospital_death=="1", y = PaO2_min))+ geom_boxplot()

    # No difference
    ggplot(data=icu_patients_df1, mapping = aes(x = in_hospital_death=="1", y = pH_min))+ geom_boxplot()

    ggplot(data=icu_patients_df1, mapping = aes(x = in_hospital_death=="1", y = pH_max))+ geom_boxplot()

    # No difference
    ggplot(data=icu_patients_df1, mapping = aes(x = in_hospital_death=="1", y = Platelets_min))+ geom_boxplot()

    # Patients who died had higher RR
    ggplot(data=icu_patients_df1, mapping = aes(x = in_hospital_death=="1", y = RespRate_max))+ geom_boxplot()

    # More outliers in patients who survived with low saturations
    ggplot(data=icu_patients_df1, mapping = aes(x = in_hospital_death=="1", y = SaO2_min))+ geom_boxplot()

    # Patients who died had slightly lower tempeatures
    ggplot(data=icu_patients_df1, mapping = aes(x = in_hospital_death=="1", y = Temp_min))+ geom_boxplot()

    ggplot(data=icu_patients_df1, mapping = aes(x = in_hospital_death=="1", y = Temp_max))+ geom_boxplot()

    # Patients who died possibly slightly higher tropT
    ggplot(data=icu_patients_df1, mapping = aes(x = in_hospital_death=="1", y = TroponinT_max))+ geom_boxplot()

    # Patients who died had slightly less urine output
    ggplot(data=icu_patients_df1, mapping = aes(x = in_hospital_death=="1", y = Urine_min))+ geom_boxplot()

    # Patients who died had slightly higher WBC
    ggplot(data=icu_patients_df1, mapping = aes(x = in_hospital_death=="1", y = WBC_max))+ geom_boxplot()

    # Not uninterpretable data
    # ggplot(data=icu_patients_df1, mapping = aes(x = in_hospital_death=="1", y = BMI))+ geom_boxplot()
    
    # Using weight instead, roughly the same
    ggplot(data=icu_patients_df1, mapping = aes(x = in_hospital_death=="1", y = Weight_min))+ geom_boxplot()
    ## Warning: Removed 146 rows containing non-finite values (stat_boxplot).

    # Patients who died had older age
    ggplot(data=icu_patients_df1, mapping = aes(x = in_hospital_death=="1", y = Age))+ geom_boxplot()

    # No difference
    ggplot(data=icu_patients_df1, mapping = aes(x = in_hospital_death=="1", y = Length_of_stay))+ geom_boxplot()

    # Patients who died had higher SAPS1 and SOFA scores
    ggplot(data=icu_patients_df1, mapping = aes(x = in_hospital_death=="1", y = SAPS1))+ geom_boxplot()
    ## Warning: Removed 96 rows containing non-finite values (stat_boxplot).

    ggplot(data=icu_patients_df1, mapping = aes(x = in_hospital_death=="1", y = SOFA))+ geom_boxplot()

    # cardiac surgery recovery unit have a smaller death circle compared to the other 3 ICU units
    # ie less proportion of in hospital deaths compared to alive
    ggplot(data=icu_patients_df1, mapping = aes(x = in_hospital_death=="1", y = ICUType)) + 
      geom_count(aes(size = after_stat(prop), group = ICUType)) + 
      scale_size_area(max_size = 50)

    1. Fit appropriate univariate logistic regression models.
    # univariate comparisons above
    # removed: Mg_min, Na_min, Na_max, MAP_diff, MAP_max
    
    ### significant variables ###
    minAlbumin_glm <- glm(in_hospital_death ~ Albumin_min, data=icu_patients_df1, family="binomial")
    summary(minAlbumin_glm)
    ## 
    ## Call:
    ## glm(formula = in_hospital_death ~ Albumin_min, family = "binomial", 
    ##     data = icu_patients_df1)
    ## 
    ## Deviance Residuals: 
    ##     Min       1Q   Median       3Q      Max  
    ## -0.7948  -0.5887  -0.5385  -0.4595   2.2842  
    ## 
    ## Coefficients:
    ##             Estimate Std. Error z value Pr(>|z|)    
    ## (Intercept) -0.36392    0.29389  -1.238    0.216    
    ## Albumin_min -0.48186    0.09987  -4.825  1.4e-06 ***
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    ## 
    ## (Dispersion parameter for binomial family taken to be 1)
    ## 
    ##     Null deviance: 1699.7  on 2060  degrees of freedom
    ## Residual deviance: 1676.0  on 2059  degrees of freedom
    ## AIC: 1680
    ## 
    ## Number of Fisher Scoring iterations: 4
    maxBili_glm <- glm(in_hospital_death ~ Bilirubin_max, data=icu_patients_df1, family="binomial")
    summary(maxBili_glm)
    ## 
    ## Call:
    ## glm(formula = in_hospital_death ~ Bilirubin_max, family = "binomial", 
    ##     data = icu_patients_df1)
    ## 
    ## Deviance Residuals: 
    ##     Min       1Q   Median       3Q      Max  
    ## -1.3889  -0.5421  -0.5363  -0.5321   2.0174  
    ## 
    ## Coefficients:
    ##               Estimate Std. Error z value Pr(>|z|)    
    ## (Intercept)   -1.90053    0.06866 -27.679  < 2e-16 ***
    ## Bilirubin_max  0.05692    0.01135   5.013 5.35e-07 ***
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    ## 
    ## (Dispersion parameter for binomial family taken to be 1)
    ## 
    ##     Null deviance: 1699.7  on 2060  degrees of freedom
    ## Residual deviance: 1676.8  on 2059  degrees of freedom
    ## AIC: 1680.8
    ## 
    ## Number of Fisher Scoring iterations: 4
    maxUrea_glm <- glm(in_hospital_death ~ BUN_max, data=icu_patients_df1, family="binomial")
    summary(maxUrea_glm)
    ## 
    ## Call:
    ## glm(formula = in_hospital_death ~ BUN_max, family = "binomial", 
    ##     data = icu_patients_df1)
    ## 
    ## Deviance Residuals: 
    ##     Min       1Q   Median       3Q      Max  
    ## -2.0462  -0.5269  -0.4789  -0.4443   2.2309  
    ## 
    ## Coefficients:
    ##              Estimate Std. Error z value Pr(>|z|)    
    ## (Intercept) -2.492189   0.103693 -24.034   <2e-16 ***
    ## BUN_max      0.022610   0.002347   9.634   <2e-16 ***
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    ## 
    ## (Dispersion parameter for binomial family taken to be 1)
    ## 
    ##     Null deviance: 1699.7  on 2060  degrees of freedom
    ## Residual deviance: 1607.3  on 2059  degrees of freedom
    ## AIC: 1611.3
    ## 
    ## Number of Fisher Scoring iterations: 4
    maxCr_glm <- glm(in_hospital_death ~ Creatinine_max, data=icu_patients_df1, family="binomial")
    summary(maxCr_glm)
    ## 
    ## Call:
    ## glm(formula = in_hospital_death ~ Creatinine_max, family = "binomial", 
    ##     data = icu_patients_df1)
    ## 
    ## Deviance Residuals: 
    ##     Min       1Q   Median       3Q      Max  
    ## -1.8627  -0.5433  -0.5270  -0.5151   2.0633  
    ## 
    ## Coefficients:
    ##                Estimate Std. Error z value Pr(>|z|)    
    ## (Intercept)    -2.05087    0.08430 -24.328  < 2e-16 ***
    ## Creatinine_max  0.16325    0.03135   5.208 1.91e-07 ***
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    ## 
    ## (Dispersion parameter for binomial family taken to be 1)
    ## 
    ##     Null deviance: 1699.7  on 2060  degrees of freedom
    ## Residual deviance: 1674.4  on 2059  degrees of freedom
    ## AIC: 1678.4
    ## 
    ## Number of Fisher Scoring iterations: 4
    minGCS_glm <- glm(in_hospital_death ~ GCS_min, data=icu_patients_df1, family="binomial")
    summary(minGCS_glm)
    ## 
    ## Call:
    ## glm(formula = in_hospital_death ~ GCS_min, family = "binomial", 
    ##     data = icu_patients_df1)
    ## 
    ## Deviance Residuals: 
    ##     Min       1Q   Median       3Q      Max  
    ## -0.6238  -0.6238  -0.5394  -0.4853   2.0964  
    ## 
    ## Coefficients:
    ##             Estimate Std. Error z value Pr(>|z|)    
    ## (Intercept) -1.40261    0.12298 -11.405  < 2e-16 ***
    ## GCS_min     -0.04514    0.01317  -3.426 0.000612 ***
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    ## 
    ## (Dispersion parameter for binomial family taken to be 1)
    ## 
    ##     Null deviance: 1699.7  on 2060  degrees of freedom
    ## Residual deviance: 1687.7  on 2059  degrees of freedom
    ## AIC: 1691.7
    ## 
    ## Number of Fisher Scoring iterations: 4
    maxGlu_glm <- glm(in_hospital_death ~ Glucose_max, data=icu_patients_df1, family="binomial")
    summary(maxGlu_glm)
    ## 
    ## Call:
    ## glm(formula = in_hospital_death ~ Glucose_max, family = "binomial", 
    ##     data = icu_patients_df1)
    ## 
    ## Deviance Residuals: 
    ##     Min       1Q   Median       3Q      Max  
    ## -1.4117  -0.5572  -0.5343  -0.5162   2.0872  
    ## 
    ## Coefficients:
    ##               Estimate Std. Error z value Pr(>|z|)    
    ## (Intercept) -2.1865370  0.1202802 -18.179  < 2e-16 ***
    ## Glucose_max  0.0023817  0.0005819   4.093 4.25e-05 ***
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    ## 
    ## (Dispersion parameter for binomial family taken to be 1)
    ## 
    ##     Null deviance: 1699.7  on 2060  degrees of freedom
    ## Residual deviance: 1684.2  on 2059  degrees of freedom
    ## AIC: 1688.2
    ## 
    ## Number of Fisher Scoring iterations: 4
    maxHR_glm <- glm(in_hospital_death ~ HR_max, data=icu_patients_df1, family="binomial")
    summary(maxHR_glm)
    ## 
    ## Call:
    ## glm(formula = in_hospital_death ~ HR_max, family = "binomial", 
    ##     data = icu_patients_df1)
    ## 
    ## Deviance Residuals: 
    ##     Min       1Q   Median       3Q      Max  
    ## -1.1194  -0.5733  -0.5402  -0.5067   2.1517  
    ## 
    ## Coefficients:
    ##              Estimate Std. Error z value Pr(>|z|)    
    ## (Intercept) -2.707555   0.303251  -8.928  < 2e-16 ***
    ## HR_max       0.008565   0.002707   3.164  0.00156 ** 
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    ## 
    ## (Dispersion parameter for binomial family taken to be 1)
    ## 
    ##     Null deviance: 1699.7  on 2060  degrees of freedom
    ## Residual deviance: 1689.9  on 2059  degrees of freedom
    ## AIC: 1693.9
    ## 
    ## Number of Fisher Scoring iterations: 4
    maxLactate_glm <- glm(in_hospital_death ~ Lactate_max, data=icu_patients_df1, family="binomial")
    summary(maxLactate_glm)
    ## 
    ## Call:
    ## glm(formula = in_hospital_death ~ Lactate_max, family = "binomial", 
    ##     data = icu_patients_df1)
    ## 
    ## Deviance Residuals: 
    ##     Min       1Q   Median       3Q      Max  
    ## -1.1726  -0.5544  -0.5200  -0.4939   2.1212  
    ## 
    ## Coefficients:
    ##             Estimate Std. Error z value Pr(>|z|)    
    ## (Intercept)  -2.1932     0.1005 -21.820  < 2e-16 ***
    ## Lactate_max   0.1372     0.0244   5.625 1.86e-08 ***
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    ## 
    ## (Dispersion parameter for binomial family taken to be 1)
    ## 
    ##     Null deviance: 1699.7  on 2060  degrees of freedom
    ## Residual deviance: 1669.5  on 2059  degrees of freedom
    ## AIC: 1673.5
    ## 
    ## Number of Fisher Scoring iterations: 4
    minPaCO2_glm <- glm(in_hospital_death ~ PaCO2_min, data=icu_patients_df1, family="binomial")
    summary(minPaCO2_glm)
    ## 
    ## Call:
    ## glm(formula = in_hospital_death ~ PaCO2_min, family = "binomial", 
    ##     data = icu_patients_df1)
    ## 
    ## Deviance Residuals: 
    ##     Min       1Q   Median       3Q      Max  
    ## -0.8026  -0.5767  -0.5530  -0.5081   2.4175  
    ## 
    ## Coefficients:
    ##              Estimate Std. Error z value Pr(>|z|)   
    ## (Intercept) -0.960864   0.295925  -3.247  0.00117 **
    ## PaCO2_min   -0.022689   0.008111  -2.797  0.00516 **
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    ## 
    ## (Dispersion parameter for binomial family taken to be 1)
    ## 
    ##     Null deviance: 1699.7  on 2060  degrees of freedom
    ## Residual deviance: 1691.4  on 2059  degrees of freedom
    ## AIC: 1695.4
    ## 
    ## Number of Fisher Scoring iterations: 4
    minpH_glm <- glm(in_hospital_death ~ pH_min, data=icu_patients_df1, family="binomial")
    summary(minpH_glm)
    ## 
    ## Call:
    ## glm(formula = in_hospital_death ~ pH_min, family = "binomial", 
    ##     data = icu_patients_df1)
    ## 
    ## Deviance Residuals: 
    ##     Min       1Q   Median       3Q      Max  
    ## -0.9980  -0.5733  -0.5358  -0.4868   2.2874  
    ## 
    ## Coefficients:
    ##             Estimate Std. Error z value Pr(>|z|)    
    ## (Intercept)  19.5912     4.8996   3.998 6.37e-05 ***
    ## pH_min       -2.9197     0.6699  -4.358 1.31e-05 ***
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    ## 
    ## (Dispersion parameter for binomial family taken to be 1)
    ## 
    ##     Null deviance: 1699.7  on 2060  degrees of freedom
    ## Residual deviance: 1677.4  on 2059  degrees of freedom
    ## AIC: 1681.4
    ## 
    ## Number of Fisher Scoring iterations: 4
    maxRR_glm <- glm(in_hospital_death ~ RespRate_max, data=icu_patients_df1, family="binomial")
    summary(maxRR_glm)
    ## 
    ## Call:
    ## glm(formula = in_hospital_death ~ RespRate_max, family = "binomial", 
    ##     data = icu_patients_df1)
    ## 
    ## Deviance Residuals: 
    ##     Min       1Q   Median       3Q      Max  
    ## -1.4489  -0.5679  -0.5233  -0.4817   2.1771  
    ## 
    ## Coefficients:
    ##               Estimate Std. Error z value Pr(>|z|)    
    ## (Intercept)  -2.835656   0.235885 -12.021  < 2e-16 ***
    ## RespRate_max  0.035250   0.007412   4.756 1.98e-06 ***
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    ## 
    ## (Dispersion parameter for binomial family taken to be 1)
    ## 
    ##     Null deviance: 1699.7  on 2060  degrees of freedom
    ## Residual deviance: 1677.8  on 2059  degrees of freedom
    ## AIC: 1681.8
    ## 
    ## Number of Fisher Scoring iterations: 4
    minTemp_glm <- glm(in_hospital_death ~ Temp_min, data=icu_patients_df1, family="binomial")
    summary(minTemp_glm)
    ## 
    ## Call:
    ## glm(formula = in_hospital_death ~ Temp_min, family = "binomial", 
    ##     data = icu_patients_df1)
    ## 
    ## Deviance Residuals: 
    ##     Min       1Q   Median       3Q      Max  
    ## -0.8918  -0.5741  -0.5409  -0.4973   2.2040  
    ## 
    ## Coefficients:
    ##             Estimate Std. Error z value Pr(>|z|)    
    ## (Intercept)   7.4599     2.3473   3.178  0.00148 ** 
    ## Temp_min     -0.2571     0.0654  -3.931 8.45e-05 ***
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    ## 
    ## (Dispersion parameter for binomial family taken to be 1)
    ## 
    ##     Null deviance: 1699.7  on 2060  degrees of freedom
    ## Residual deviance: 1684.3  on 2059  degrees of freedom
    ## AIC: 1688.3
    ## 
    ## Number of Fisher Scoring iterations: 4
    maxTropT_glm <- glm(in_hospital_death ~ TroponinT_max, data=icu_patients_df1, family="binomial")
    summary(maxTropT_glm)
    ## 
    ## Call:
    ## glm(formula = in_hospital_death ~ TroponinT_max, family = "binomial", 
    ##     data = icu_patients_df1)
    ## 
    ## Deviance Residuals: 
    ##     Min       1Q   Median       3Q      Max  
    ## -1.0740  -0.5503  -0.5430  -0.5416   1.9965  
    ## 
    ## Coefficients:
    ##               Estimate Std. Error z value Pr(>|z|)    
    ## (Intercept)   -1.84719    0.06917 -26.705   <2e-16 ***
    ## TroponinT_max  0.06537    0.02638   2.478   0.0132 *  
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    ## 
    ## (Dispersion parameter for binomial family taken to be 1)
    ## 
    ##     Null deviance: 1699.7  on 2060  degrees of freedom
    ## Residual deviance: 1694.1  on 2059  degrees of freedom
    ## AIC: 1698.1
    ## 
    ## Number of Fisher Scoring iterations: 4
    minUrine_glm <- glm(in_hospital_death ~ Urine_min, data=icu_patients_df1, family="binomial")
    summary(minUrine_glm)
    ## 
    ## Call:
    ## glm(formula = in_hospital_death ~ Urine_min, family = "binomial", 
    ##     data = icu_patients_df1)
    ## 
    ## Deviance Residuals: 
    ##     Min       1Q   Median       3Q      Max  
    ## -0.6034  -0.5952  -0.5631  -0.5105   2.9438  
    ## 
    ## Coefficients:
    ##              Estimate Std. Error z value Pr(>|z|)    
    ## (Intercept) -1.610937   0.076052 -21.182  < 2e-16 ***
    ## Urine_min   -0.006020   0.001787  -3.369 0.000756 ***
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    ## 
    ## (Dispersion parameter for binomial family taken to be 1)
    ## 
    ##     Null deviance: 1699.7  on 2060  degrees of freedom
    ## Residual deviance: 1683.1  on 2059  degrees of freedom
    ## AIC: 1687.1
    ## 
    ## Number of Fisher Scoring iterations: 5
    maxWBC_glm <- glm(in_hospital_death ~ WBC_max, data=icu_patients_df1, family="binomial")
    summary(maxWBC_glm)
    ## 
    ## Call:
    ## glm(formula = in_hospital_death ~ WBC_max, family = "binomial", 
    ##     data = icu_patients_df1)
    ## 
    ## Deviance Residuals: 
    ##     Min       1Q   Median       3Q      Max  
    ## -1.2674  -0.5631  -0.5475  -0.5326   2.0545  
    ## 
    ## Coefficients:
    ##              Estimate Std. Error z value Pr(>|z|)    
    ## (Intercept) -1.982652   0.118080 -16.791   <2e-16 ***
    ## WBC_max      0.014086   0.006859   2.054     0.04 *  
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    ## 
    ## (Dispersion parameter for binomial family taken to be 1)
    ## 
    ##     Null deviance: 1699.7  on 2060  degrees of freedom
    ## Residual deviance: 1695.7  on 2059  degrees of freedom
    ## AIC: 1699.7
    ## 
    ## Number of Fisher Scoring iterations: 4
    age_glm <- glm(in_hospital_death ~ Age, data=icu_patients_df1, family="binomial")
    summary(age_glm)
    ## 
    ## Call:
    ## glm(formula = in_hospital_death ~ Age, family = "binomial", data = icu_patients_df1)
    ## 
    ## Deviance Residuals: 
    ##     Min       1Q   Median       3Q      Max  
    ## -0.7522  -0.6264  -0.5111  -0.3919   2.5135  
    ## 
    ## Coefficients:
    ##              Estimate Std. Error z value Pr(>|z|)    
    ## (Intercept) -3.761624   0.303337 -12.401  < 2e-16 ***
    ## Age          0.029376   0.004229   6.947 3.73e-12 ***
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    ## 
    ## (Dispersion parameter for binomial family taken to be 1)
    ## 
    ##     Null deviance: 1699.7  on 2060  degrees of freedom
    ## Residual deviance: 1644.9  on 2059  degrees of freedom
    ## AIC: 1648.9
    ## 
    ## Number of Fisher Scoring iterations: 5
    gender_glm <- glm(in_hospital_death ~ Gender, data=icu_patients_df1, family="binomial")
    summary(gender_glm)
    ## 
    ## Call:
    ## glm(formula = in_hospital_death ~ Gender, family = "binomial", 
    ##     data = icu_patients_df1)
    ## 
    ## Deviance Residuals: 
    ##     Min       1Q   Median       3Q      Max  
    ## -0.5612  -0.5612  -0.5553  -0.5553   1.9728  
    ## 
    ## Coefficients:
    ##             Estimate Std. Error z value Pr(>|z|)    
    ## (Intercept) -1.76894    0.09381 -18.856   <2e-16 ***
    ## GenderMale  -0.02281    0.12615  -0.181    0.856    
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    ## 
    ## (Dispersion parameter for binomial family taken to be 1)
    ## 
    ##     Null deviance: 1699.7  on 2060  degrees of freedom
    ## Residual deviance: 1699.7  on 2059  degrees of freedom
    ## AIC: 1703.7
    ## 
    ## Number of Fisher Scoring iterations: 4
    icuType_glm <- glm(in_hospital_death ~ ICUType, data=icu_patients_df1, family="binomial")
    summary(icuType_glm)
    ## 
    ## Call:
    ## glm(formula = in_hospital_death ~ ICUType, family = "binomial", 
    ##     data = icu_patients_df1)
    ## 
    ## Deviance Residuals: 
    ##     Min       1Q   Median       3Q      Max  
    ## -0.6402  -0.6402  -0.5615  -0.3458   2.3861  
    ## 
    ## Coefficients:
    ##                                      Estimate Std. Error z value Pr(>|z|)    
    ## (Intercept)                           -1.6463     0.1576 -10.443  < 2e-16 ***
    ## ICUTypeCardiac Surgery Recovery Unit  -1.1407     0.2563  -4.451 8.55e-06 ***
    ## ICUTypeMedical ICU                     0.1653     0.1824   0.906    0.365    
    ## ICUTypeSurgical ICU                   -0.1214     0.2001  -0.607    0.544    
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    ## 
    ## (Dispersion parameter for binomial family taken to be 1)
    ## 
    ##     Null deviance: 1699.7  on 2060  degrees of freedom
    ## Residual deviance: 1655.3  on 2057  degrees of freedom
    ## AIC: 1663.3
    ## 
    ## Number of Fisher Scoring iterations: 5
    SAPS_glm <- glm(in_hospital_death ~ SAPS1, data=icu_patients_df1, family="binomial")
    summary(SAPS_glm)
    ## 
    ## Call:
    ## glm(formula = in_hospital_death ~ SAPS1, family = "binomial", 
    ##     data = icu_patients_df1)
    ## 
    ## Deviance Residuals: 
    ##     Min       1Q   Median       3Q      Max  
    ## -1.3834  -0.5894  -0.4662  -0.3448   2.6278  
    ## 
    ## Coefficients:
    ##             Estimate Std. Error z value Pr(>|z|)    
    ## (Intercept) -3.79727    0.23888 -15.896   <2e-16 ***
    ## SAPS1        0.12558    0.01338   9.384   <2e-16 ***
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    ## 
    ## (Dispersion parameter for binomial family taken to be 1)
    ## 
    ##     Null deviance: 1627.0  on 1964  degrees of freedom
    ## Residual deviance: 1530.8  on 1963  degrees of freedom
    ##   (96 observations deleted due to missingness)
    ## AIC: 1534.8
    ## 
    ## Number of Fisher Scoring iterations: 5
    SOFA_glm <- glm(in_hospital_death ~ SOFA, data=icu_patients_df1, family="binomial")
    summary(SOFA_glm)
    ## 
    ## Call:
    ## glm(formula = in_hospital_death ~ SOFA, family = "binomial", 
    ##     data = icu_patients_df1)
    ## 
    ## Deviance Residuals: 
    ##     Min       1Q   Median       3Q      Max  
    ## -1.0747  -0.5835  -0.4771  -0.3623   2.4609  
    ## 
    ## Coefficients:
    ##             Estimate Std. Error z value Pr(>|z|)    
    ## (Intercept) -2.83453    0.14090 -20.117   <2e-16 ***
    ## SOFA         0.14378    0.01539   9.342   <2e-16 ***
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    ## 
    ## (Dispersion parameter for binomial family taken to be 1)
    ## 
    ##     Null deviance: 1699.7  on 2060  degrees of freedom
    ## Residual deviance: 1607.5  on 2059  degrees of freedom
    ## AIC: 1611.5
    ## 
    ## Number of Fisher Scoring iterations: 5
    ### not significant variables but clinically relevant ###
    maxFiO2_glm <- glm(in_hospital_death ~ FiO2_max, data=icu_patients_df1, family="binomial")
    summary(maxFiO2_glm)
    ## 
    ## Call:
    ## glm(formula = in_hospital_death ~ FiO2_max, family = "binomial", 
    ##     data = icu_patients_df1)
    ## 
    ## Deviance Residuals: 
    ##     Min       1Q   Median       3Q      Max  
    ## -0.5634  -0.5634  -0.5555  -0.5504   1.9898  
    ## 
    ## Coefficients:
    ##             Estimate Std. Error z value Pr(>|z|)    
    ## (Intercept)  -1.8614     0.2102  -8.856   <2e-16 ***
    ## FiO2_max      0.1011     0.2533   0.399     0.69    
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    ## 
    ## (Dispersion parameter for binomial family taken to be 1)
    ## 
    ##     Null deviance: 1699.7  on 2060  degrees of freedom
    ## Residual deviance: 1699.5  on 2059  degrees of freedom
    ## AIC: 1703.5
    ## 
    ## Number of Fisher Scoring iterations: 4
    minHCT_glm <- minGlu_glm <- glm(in_hospital_death ~ HCT_min, data=icu_patients_df1, family="binomial")
    summary(minHCT_glm)
    ## 
    ## Call:
    ## glm(formula = in_hospital_death ~ HCT_min, family = "binomial", 
    ##     data = icu_patients_df1)
    ## 
    ## Deviance Residuals: 
    ##     Min       1Q   Median       3Q      Max  
    ## -0.6521  -0.5743  -0.5512  -0.5203   2.1340  
    ## 
    ## Coefficients:
    ##             Estimate Std. Error z value Pr(>|z|)    
    ## (Intercept) -1.16563    0.33501  -3.479 0.000503 ***
    ## HCT_min     -0.02064    0.01111  -1.857 0.063306 .  
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    ## 
    ## (Dispersion parameter for binomial family taken to be 1)
    ## 
    ##     Null deviance: 1699.7  on 2060  degrees of freedom
    ## Residual deviance: 1696.2  on 2059  degrees of freedom
    ## AIC: 1700.2
    ## 
    ## Number of Fisher Scoring iterations: 4
    maxK_glm <- glm(in_hospital_death ~ K_max, data=icu_patients_df1, family="binomial")
    summary(maxK_glm) # not sigifnicant but has been included in the SAPS
    ## 
    ## Call:
    ## glm(formula = in_hospital_death ~ K_max, family = "binomial", 
    ##     data = icu_patients_df1)
    ## 
    ## Deviance Residuals: 
    ##     Min       1Q   Median       3Q      Max  
    ## -1.2402  -0.5620  -0.5512  -0.5380   2.0561  
    ## 
    ## Coefficients:
    ##             Estimate Std. Error z value Pr(>|z|)    
    ## (Intercept) -2.24634    0.28233  -7.956 1.77e-15 ***
    ## K_max        0.10449    0.06153   1.698   0.0895 .  
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    ## 
    ## (Dispersion parameter for binomial family taken to be 1)
    ## 
    ##     Null deviance: 1699.7  on 2060  degrees of freedom
    ## Residual deviance: 1697.0  on 2059  degrees of freedom
    ## AIC: 1701
    ## 
    ## Number of Fisher Scoring iterations: 4
    minMAP_glm <- glm(in_hospital_death ~ MAP_min, data=icu_patients_df1, family="binomial")
    summary(minMAP_glm)
    ## 
    ## Call:
    ## glm(formula = in_hospital_death ~ MAP_min, family = "binomial", 
    ##     data = icu_patients_df1)
    ## 
    ## Deviance Residuals: 
    ##     Min       1Q   Median       3Q      Max  
    ## -0.6583  -0.5674  -0.5551  -0.5341   2.4214  
    ## 
    ## Coefficients:
    ##              Estimate Std. Error z value Pr(>|z|)    
    ## (Intercept) -1.413112   0.257434  -5.489 4.04e-08 ***
    ## MAP_min     -0.005926   0.004051  -1.463    0.143    
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    ## 
    ## (Dispersion parameter for binomial family taken to be 1)
    ## 
    ##     Null deviance: 1699.7  on 2060  degrees of freedom
    ## Residual deviance: 1697.4  on 2059  degrees of freedom
    ## AIC: 1701.4
    ## 
    ## Number of Fisher Scoring iterations: 4
    maxPaCO2_glm <- glm(in_hospital_death ~ PaCO2_max, data=icu_patients_df1, family="binomial")
    summary(maxPaCO2_glm)
    ## 
    ## Call:
    ## glm(formula = in_hospital_death ~ PaCO2_max, family = "binomial", 
    ##     data = icu_patients_df1)
    ## 
    ## Deviance Residuals: 
    ##     Min       1Q   Median       3Q      Max  
    ## -0.6389  -0.5715  -0.5556  -0.5300   2.1798  
    ## 
    ## Coefficients:
    ##             Estimate Std. Error z value Pr(>|z|)    
    ## (Intercept) -1.32279    0.27600  -4.793 1.65e-06 ***
    ## PaCO2_max   -0.01017    0.00601  -1.691   0.0908 .  
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    ## 
    ## (Dispersion parameter for binomial family taken to be 1)
    ## 
    ##     Null deviance: 1699.7  on 2060  degrees of freedom
    ## Residual deviance: 1696.7  on 2059  degrees of freedom
    ## AIC: 1700.7
    ## 
    ## Number of Fisher Scoring iterations: 4
    minPaO2_glm <- glm(in_hospital_death ~ PaO2_min, data=icu_patients_df1, family="binomial")
    summary(minPaO2_glm)
    ## 
    ## Call:
    ## glm(formula = in_hospital_death ~ PaO2_min, family = "binomial", 
    ##     data = icu_patients_df1)
    ## 
    ## Deviance Residuals: 
    ##     Min       1Q   Median       3Q      Max  
    ## -0.5747  -0.5632  -0.5591  -0.5484   2.0765  
    ## 
    ## Coefficients:
    ##              Estimate Std. Error z value Pr(>|z|)    
    ## (Intercept) -1.702103   0.143274 -11.880   <2e-16 ***
    ## PaO2_min    -0.000757   0.001235  -0.613     0.54    
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    ## 
    ## (Dispersion parameter for binomial family taken to be 1)
    ## 
    ##     Null deviance: 1699.7  on 2060  degrees of freedom
    ## Residual deviance: 1699.3  on 2059  degrees of freedom
    ## AIC: 1703.3
    ## 
    ## Number of Fisher Scoring iterations: 4
    minPlt_glm <- glm(in_hospital_death ~ Platelets_min, data=icu_patients_df1, family="binomial")
    summary(minPlt_glm) # not significant but has been included in the SOFA score
    ## 
    ## Call:
    ## glm(formula = in_hospital_death ~ Platelets_min, family = "binomial", 
    ##     data = icu_patients_df1)
    ## 
    ## Deviance Residuals: 
    ##     Min       1Q   Median       3Q      Max  
    ## -0.6122  -0.5735  -0.5558  -0.5260   2.2141  
    ## 
    ## Coefficients:
    ##                 Estimate Std. Error z value Pr(>|z|)    
    ## (Intercept)   -1.5693181  0.1352494 -11.603   <2e-16 ***
    ## Platelets_min -0.0010963  0.0006322  -1.734   0.0829 .  
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    ## 
    ## (Dispersion parameter for binomial family taken to be 1)
    ## 
    ##     Null deviance: 1699.7  on 2060  degrees of freedom
    ## Residual deviance: 1696.6  on 2059  degrees of freedom
    ## AIC: 1700.6
    ## 
    ## Number of Fisher Scoring iterations: 4
    minSaO2_glm <- glm(in_hospital_death ~ SaO2_min, data=icu_patients_df1, family="binomial")
    summary(minSaO2_glm)
    ## 
    ## Call:
    ## glm(formula = in_hospital_death ~ SaO2_min, family = "binomial", 
    ##     data = icu_patients_df1)
    ## 
    ## Deviance Residuals: 
    ##     Min       1Q   Median       3Q      Max  
    ## -0.5753  -0.5578  -0.5575  -0.5573   1.9703  
    ## 
    ## Coefficients:
    ##              Estimate Std. Error z value Pr(>|z|)
    ## (Intercept) -1.680258   1.535685  -1.094    0.274
    ## SaO2_min    -0.001057   0.016011  -0.066    0.947
    ## 
    ## (Dispersion parameter for binomial family taken to be 1)
    ## 
    ##     Null deviance: 1699.7  on 2060  degrees of freedom
    ## Residual deviance: 1699.7  on 2059  degrees of freedom
    ## AIC: 1703.7
    ## 
    ## Number of Fisher Scoring iterations: 4
    minWeight_glm <- glm(in_hospital_death ~ Weight_min, data=icu_patients_df1, family="binomial")
    summary(minWeight_glm)
    ## 
    ## Call:
    ## glm(formula = in_hospital_death ~ Weight_min, family = "binomial", 
    ##     data = icu_patients_df1)
    ## 
    ## Deviance Residuals: 
    ##     Min       1Q   Median       3Q      Max  
    ## -0.6288  -0.5816  -0.5622  -0.5305   2.1416  
    ## 
    ## Coefficients:
    ##              Estimate Std. Error z value Pr(>|z|)    
    ## (Intercept) -1.343757   0.241289  -5.569 2.56e-08 ***
    ## Weight_min  -0.005110   0.002943  -1.736   0.0826 .  
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    ## 
    ## (Dispersion parameter for binomial family taken to be 1)
    ## 
    ##     Null deviance: 1604.2  on 1914  degrees of freedom
    ## Residual deviance: 1601.0  on 1913  degrees of freedom
    ##   (146 observations deleted due to missingness)
    ## AIC: 1605
    ## 
    ## Number of Fisher Scoring iterations: 4
    LOS_glm <- glm(in_hospital_death ~ Length_of_stay, data=icu_patients_df1, family="binomial")
    summary(LOS_glm)
    ## 
    ## Call:
    ## glm(formula = in_hospital_death ~ Length_of_stay, family = "binomial", 
    ##     data = icu_patients_df1)
    ## 
    ## Deviance Residuals: 
    ##     Min       1Q   Median       3Q      Max  
    ## -0.8655  -0.5594  -0.5466  -0.5413   2.0058  
    ## 
    ## Coefficients:
    ##                 Estimate Std. Error z value Pr(>|z|)    
    ## (Intercept)    -1.882249   0.089155 -21.112   <2e-16 ***
    ## Length_of_stay  0.007099   0.004331   1.639    0.101    
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    ## 
    ## (Dispersion parameter for binomial family taken to be 1)
    ## 
    ##     Null deviance: 1699.7  on 2060  degrees of freedom
    ## Residual deviance: 1697.2  on 2059  degrees of freedom
    ## AIC: 1701.2
    ## 
    ## Number of Fisher Scoring iterations: 4
    ### not significant and may not be clinically relevant ###
    minGlu_glm <- glm(in_hospital_death ~ Glucose_min, data=icu_patients_df1, family="binomial")
    summary(minGlu_glm)
    ## 
    ## Call:
    ## glm(formula = in_hospital_death ~ Glucose_min, family = "binomial", 
    ##     data = icu_patients_df1)
    ## 
    ## Deviance Residuals: 
    ##     Min       1Q   Median       3Q      Max  
    ## -0.7799  -0.5613  -0.5522  -0.5428   2.0271  
    ## 
    ## Coefficients:
    ##              Estimate Std. Error z value Pr(>|z|)    
    ## (Intercept) -1.967537   0.171241 -11.490   <2e-16 ***
    ## Glucose_min  0.001476   0.001253   1.178    0.239    
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    ## 
    ## (Dispersion parameter for binomial family taken to be 1)
    ## 
    ##     Null deviance: 1699.7  on 2060  degrees of freedom
    ## Residual deviance: 1698.4  on 2059  degrees of freedom
    ## AIC: 1702.4
    ## 
    ## Number of Fisher Scoring iterations: 4
    minHR_glm <- glm(in_hospital_death ~ HR_min, data=icu_patients_df1, family="binomial")
    summary(minHR_glm)
    ## 
    ## Call:
    ## glm(formula = in_hospital_death ~ HR_min, family = "binomial", 
    ##     data = icu_patients_df1)
    ## 
    ## Deviance Residuals: 
    ##     Min       1Q   Median       3Q      Max  
    ## -0.6235  -0.5656  -0.5528  -0.5390   2.1087  
    ## 
    ## Coefficients:
    ##              Estimate Std. Error z value Pr(>|z|)    
    ## (Intercept) -2.108733   0.301434  -6.996 2.64e-12 ***
    ## HR_min       0.004520   0.004052   1.115    0.265    
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    ## 
    ## (Dispersion parameter for binomial family taken to be 1)
    ## 
    ##     Null deviance: 1699.7  on 2060  degrees of freedom
    ## Residual deviance: 1698.4  on 2059  degrees of freedom
    ## AIC: 1702.4
    ## 
    ## Number of Fisher Scoring iterations: 4
    minK_glm <- glm(in_hospital_death ~ K_min, data=icu_patients_df1, family="binomial")
    summary(minK_glm)
    ## 
    ## Call:
    ## glm(formula = in_hospital_death ~ K_min, family = "binomial", 
    ##     data = icu_patients_df1)
    ## 
    ## Deviance Residuals: 
    ##     Min       1Q   Median       3Q      Max  
    ## -0.6024  -0.5647  -0.5546  -0.5447   2.0345  
    ## 
    ## Coefficients:
    ##             Estimate Std. Error z value Pr(>|z|)    
    ## (Intercept) -1.47413    0.42361  -3.480 0.000502 ***
    ## K_min       -0.07804    0.10660  -0.732 0.464083    
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    ## 
    ## (Dispersion parameter for binomial family taken to be 1)
    ## 
    ##     Null deviance: 1699.7  on 2060  degrees of freedom
    ## Residual deviance: 1699.1  on 2059  degrees of freedom
    ## AIC: 1703.1
    ## 
    ## Number of Fisher Scoring iterations: 4
    maxpH_glm <- glm(in_hospital_death ~ pH_max, data=icu_patients_df1, family="binomial")
    summary(maxpH_glm)
    ## 
    ## Call:
    ## glm(formula = in_hospital_death ~ pH_max, family = "binomial", 
    ##     data = icu_patients_df1)
    ## 
    ## Deviance Residuals: 
    ##     Min       1Q   Median       3Q      Max  
    ## -0.6684  -0.5677  -0.5523  -0.5297   2.0743  
    ## 
    ## Coefficients:
    ##             Estimate Std. Error z value Pr(>|z|)
    ## (Intercept)   9.3001     7.0197   1.325    0.185
    ## pH_max       -1.4944     0.9469  -1.578    0.115
    ## 
    ## (Dispersion parameter for binomial family taken to be 1)
    ## 
    ##     Null deviance: 1699.7  on 2060  degrees of freedom
    ## Residual deviance: 1697.2  on 2059  degrees of freedom
    ## AIC: 1701.2
    ## 
    ## Number of Fisher Scoring iterations: 4
    maxTemp_glm <- glm(in_hospital_death ~ Temp_max, data=icu_patients_df1, family="binomial")
    summary(maxTemp_glm)
    ## 
    ## Call:
    ## glm(formula = in_hospital_death ~ Temp_max, family = "binomial", 
    ##     data = icu_patients_df1)
    ## 
    ## Deviance Residuals: 
    ##     Min       1Q   Median       3Q      Max  
    ## -0.6077  -0.5689  -0.5549  -0.5366   2.1386  
    ## 
    ## Coefficients:
    ##             Estimate Std. Error z value Pr(>|z|)
    ## (Intercept)  1.60419    3.08152   0.521    0.603
    ## Temp_max    -0.08988    0.08183  -1.098    0.272
    ## 
    ## (Dispersion parameter for binomial family taken to be 1)
    ## 
    ##     Null deviance: 1699.7  on 2060  degrees of freedom
    ## Residual deviance: 1698.5  on 2059  degrees of freedom
    ## AIC: 1702.5
    ## 
    ## Number of Fisher Scoring iterations: 4
    # IN SUMMARY:
    # Variables that are significant on univariate and clinically relevant are:
    # minAlbumin, maxBili, maxUrea, maxCr, minGCS, maxGlu, maxHR, maxLactate, minPaCO2
    # minpH, maxRR, minTemp, maxTropT, minUrine, maxWBC
    # age, gender, icuType, SAPS, SOFA
    
    # Variables not significant but still clinically relevant are:
    # maxFiO2, minHCT, maxK, minMAP, maxPaCO2, minPaO2, minPlt, minSaO2, minWeight, LOS
    
    # Variables not significant and may not be relevant are:
    # minGlu, minHR, minK, maxpH, maxTemp
    1. Fit an appropriate series of multivariable logistic regression models, justifying your approach. Assess each model you consider for goodness of fit and other relevant statistics.
    # Considering ALL variables:
    # column 1 is record id
    # column 2 is Length_of_stay
    # column 3 is SAPS1
    # column 4 is SOFA
    # column 5 is survival
    # column 7 is days
    # column 8 is status
    # column 9 is age
    # column 43 is gender
    # column 53 is Height
    # column 57 is ICUType
    # these columns should be excluded - the relevant ones will be reincluded in future models
    
    
    # Split looking at all the variables by min,max,diff
    # If trying to look at all variables at the same time, leads to linearity error
    
    ### min ICU data ###
    
    minICUdata <- icu_patients_df1[, -c(1,2,3,4,5,7,8,9,43,53,57)] # remove columns as above
    minICUdata <- minICUdata[, c(1, seq(from=4, to=109, by = 3))] # every third column starting from a min column
    
    minICU_glm <- glm(in_hospital_death ~ . ,data=minICUdata, family="binomial")
    summary(minICU_glm)
    ## 
    ## Call:
    ## glm(formula = in_hospital_death ~ ., family = "binomial", data = minICUdata)
    ## 
    ## Deviance Residuals: 
    ##     Min       1Q   Median       3Q      Max  
    ## -2.2955  -0.5727  -0.4072  -0.2776   2.7828  
    ## 
    ## Coefficients:
    ##                   Estimate Std. Error z value Pr(>|z|)    
    ## (Intercept)      1.773e+01  9.773e+00   1.814  0.06970 .  
    ## Albumin_min     -2.524e-01  1.715e-01  -1.472  0.14113    
    ## ALP_min          5.055e-04  1.257e-03   0.402  0.68760    
    ## ALT_min          2.757e-05  8.303e-04   0.033  0.97352    
    ## AST_min          4.045e-04  7.875e-04   0.514  0.60751    
    ## Bilirubin_min    4.668e-02  2.370e-02   1.970  0.04889 *  
    ## BUN_min          3.791e-02  6.614e-03   5.733 9.89e-09 ***
    ## Cholesterol_min  1.887e-03  2.871e-03   0.657  0.51087    
    ## Creatinine_min  -1.937e-01  1.019e-01  -1.901  0.05726 .  
    ## DiasABP_min     -3.576e-02  1.620e-02  -2.207  0.02731 *  
    ## FiO2_min         4.107e-02  7.920e-01   0.052  0.95864    
    ## GCS_min         -3.161e-02  2.894e-02  -1.092  0.27471    
    ## Glucose_min      1.368e-03  1.915e-03   0.714  0.47517    
    ## HCO3_min         6.303e-03  3.293e-02   0.191  0.84824    
    ## HCT_min          5.159e-02  2.041e-02   2.528  0.01146 *  
    ## HR_min           2.609e-03  7.060e-03   0.370  0.71170    
    ## K_min            5.810e-02  1.864e-01   0.312  0.75524    
    ## Lactate_min      2.505e-01  9.530e-02   2.629  0.00857 ** 
    ## MAP_min          5.312e-03  1.173e-02   0.453  0.65059    
    ## Mg_min          -1.834e-01  2.599e-01  -0.706  0.48047    
    ## Na_min          -4.391e-02  2.213e-02  -1.984  0.04722 *  
    ## NIDiasABP_min   -7.657e-03  2.002e-02  -0.383  0.70205    
    ## NIMAP_min        5.751e-03  2.884e-02   0.199  0.84197    
    ## NISysABP_min    -4.665e-03  1.165e-02  -0.400  0.68893    
    ## PaCO2_min       -1.525e-02  1.924e-02  -0.792  0.42822    
    ## PaO2_min         4.963e-03  2.060e-03   2.409  0.01600 *  
    ## pH_min          -8.939e-01  1.288e+00  -0.694  0.48759    
    ## Platelets_min   -1.537e-03  1.223e-03  -1.257  0.20866    
    ## RespRate_min     4.789e-02  2.834e-02   1.690  0.09110 .  
    ## SaO2_min        -2.344e-03  2.228e-02  -0.105  0.91620    
    ## SysABP_min       1.156e-02  8.330e-03   1.387  0.16532    
    ## Temp_min        -2.360e-01  1.047e-01  -2.254  0.02420 *  
    ## TroponinI_min    6.947e-04  9.713e-03   0.072  0.94298    
    ## TroponinT_min    4.483e-02  7.785e-02   0.576  0.56474    
    ## Urine_min       -9.429e-03  4.344e-03  -2.171  0.02994 *  
    ## WBC_min          7.786e-03  1.606e-02   0.485  0.62787    
    ## Weight_min      -5.260e-03  4.887e-03  -1.076  0.28176    
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    ## 
    ## (Dispersion parameter for binomial family taken to be 1)
    ## 
    ##     Null deviance: 824.13  on 900  degrees of freedom
    ## Residual deviance: 666.99  on 864  degrees of freedom
    ##   (1160 observations deleted due to missingness)
    ## AIC: 740.99
    ## 
    ## Number of Fisher Scoring iterations: 6
    # Bilirubin_min, BUN_min, HCT_min, Lactate_min, Temp_min, Na_min, PaO2_min, Urine_min were statistically significant
    
    
    ###  max ICU data ### 
    maxICUdata <- icu_patients_df1[, -c(1,2,3,4,5,7,8,9,43,53,57)] # remove columns as above
    maxICUdata <- maxICUdata[, c(1, seq(from=3, to=109, by = 3))] # every third column starting from a max column
    
    maxICU_glm <- glm(in_hospital_death ~ . ,data=maxICUdata, family="binomial")
    summary(maxICU_glm)
    ## 
    ## Call:
    ## glm(formula = in_hospital_death ~ ., family = "binomial", data = maxICUdata)
    ## 
    ## Deviance Residuals: 
    ##     Min       1Q   Median       3Q      Max  
    ## -2.3984  -0.5545  -0.3605  -0.1874   2.8434  
    ## 
    ## Coefficients:
    ##                   Estimate Std. Error z value Pr(>|z|)    
    ## (Intercept)     21.2860253 18.0303070   1.181  0.23777    
    ## Albumin_max     -0.0825905  0.1815881  -0.455  0.64924    
    ## ALP_max          0.0015035  0.0012509   1.202  0.22938    
    ## ALT_max         -0.0008123  0.0008072  -1.006  0.31427    
    ## AST_max          0.0006405  0.0005096   1.257  0.20878    
    ## Bilirubin_max    0.0343710  0.0221317   1.553  0.12042    
    ## BUN_max          0.0254793  0.0063233   4.029 5.59e-05 ***
    ## Cholesterol_max -0.0018775  0.0033489  -0.561  0.57504    
    ## Creatinine_max  -0.1781563  0.0855192  -2.083  0.03723 *  
    ## DiasABP_max     -0.0316657  0.0097136  -3.260  0.00111 ** 
    ## FiO2_max         0.6685025  0.4957475   1.348  0.17751    
    ## GCS_max         -0.1939523  0.0361370  -5.367 8.00e-08 ***
    ## Glucose_max      0.0011024  0.0011521   0.957  0.33864    
    ## HCO3_max        -0.0372768  0.0365012  -1.021  0.30714    
    ## HCT_max         -0.0096146  0.0239799  -0.401  0.68846    
    ## HR_max           0.0028140  0.0050193   0.561  0.57505    
    ## K_max            0.0947209  0.1390359   0.681  0.49570    
    ## Lactate_max      0.1428551  0.0539140   2.650  0.00806 ** 
    ## MAP_max          0.0037985  0.0030598   1.241  0.21446    
    ## Mg_max          -0.3885880  0.2473900  -1.571  0.11624    
    ## Na_max          -0.0451421  0.0230563  -1.958  0.05024 .  
    ## NIDiasABP_max    0.0131768  0.0150320   0.877  0.38071    
    ## NIMAP_max       -0.0080854  0.0200386  -0.403  0.68659    
    ## NISysABP_max     0.0057593  0.0082049   0.702  0.48272    
    ## PaCO2_max        0.0059686  0.0126673   0.471  0.63751    
    ## PaO2_max        -0.0011683  0.0011319  -1.032  0.30201    
    ## pH_max           1.3750765  2.1827519   0.630  0.52871    
    ## Platelets_max   -0.0023159  0.0011068  -2.092  0.03640 *  
    ## RespRate_max     0.0063621  0.0166524   0.382  0.70242    
    ## SaO2_max        -0.1018260  0.0597825  -1.703  0.08852 .  
    ## SysABP_max       0.0093153  0.0055154   1.689  0.09123 .  
    ## Temp_max        -0.3690320  0.1461566  -2.525  0.01157 *  
    ## TroponinI_max   -0.0089373  0.0108825  -0.821  0.41150    
    ## TroponinT_max    0.0463654  0.0581954   0.797  0.42561    
    ## Urine_max       -0.0009914  0.0003290  -3.013  0.00259 ** 
    ## WBC_max          0.0049944  0.0134355   0.372  0.71009    
    ## Weight_max      -0.0055211  0.0045915  -1.202  0.22919    
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    ## 
    ## (Dispersion parameter for binomial family taken to be 1)
    ## 
    ##     Null deviance: 824.13  on 900  degrees of freedom
    ## Residual deviance: 624.41  on 864  degrees of freedom
    ##   (1160 observations deleted due to missingness)
    ## AIC: 698.41
    ## 
    ## Number of Fisher Scoring iterations: 6
    # BUN_max, Creatinine_max, GCS_max, Platelet_max, Temp_max, Urine_max were statistically significant
    
    ### diff ICU data ### 
    diffICUdata <- icu_patients_df1[, -c(1,2,3,4,5,7,8,9,43,53,57)] # remove columns as above
    diffICUdata <- diffICUdata[, c(1, seq(from=2, to=109, by = 3))] # every third column starting from a diff column
    
    diffICU_glm <- glm(in_hospital_death ~ . ,data=diffICUdata, family="binomial")
    summary(diffICU_glm)
    ## 
    ## Call:
    ## glm(formula = in_hospital_death ~ ., family = "binomial", data = diffICUdata)
    ## 
    ## Deviance Residuals: 
    ##     Min       1Q   Median       3Q      Max  
    ## -1.7007  -0.5804  -0.4198  -0.2603   2.8621  
    ## 
    ## Coefficients:
    ##                    Estimate Std. Error z value Pr(>|z|)    
    ## (Intercept)      -4.025e+00  5.635e-01  -7.143  9.1e-13 ***
    ## Albumin_diff      6.868e-01  2.531e-01   2.714  0.00665 ** 
    ## ALP_diff          1.003e-03  1.406e-03   0.714  0.47544    
    ## ALT_diff          2.852e-05  7.942e-04   0.036  0.97135    
    ## AST_diff          1.865e-04  4.892e-04   0.381  0.70307    
    ## Bilirubin_diff    4.298e-02  2.103e-02   2.044  0.04098 *  
    ## BUN_diff          1.910e-02  6.328e-03   3.017  0.00255 ** 
    ## Cholesterol_diff  5.767e-03  4.217e-03   1.368  0.17144    
    ## Creatinine_diff  -1.058e-01  8.330e-02  -1.270  0.20419    
    ## DiasABP_diff     -1.396e-02  9.392e-03  -1.487  0.13712    
    ## FiO2_diff         4.869e-01  6.906e-01   0.705  0.48085    
    ## GCS_diff          4.374e-02  5.009e-02   0.873  0.38259    
    ## Glucose_diff      1.647e-03  1.341e-03   1.229  0.21924    
    ## HCO3_diff         2.568e-02  3.403e-02   0.754  0.45059    
    ## HCT_diff         -2.201e-02  2.829e-02  -0.778  0.43653    
    ## HR_diff           4.489e-03  5.827e-03   0.770  0.44102    
    ## K_diff            9.497e-02  1.626e-01   0.584  0.55930    
    ## Lactate_diff      1.227e-01  6.884e-02   1.783  0.07460 .  
    ## MAP_diff          1.892e-04  2.966e-03   0.064  0.94914    
    ## Mg_diff          -6.705e-01  3.348e-01  -2.003  0.04522 *  
    ## Na_diff           5.171e-02  2.748e-02   1.882  0.05987 .  
    ## NIDiasABP_diff    2.526e-03  1.431e-02   0.176  0.85991    
    ## NIMAP_diff       -7.757e-03  1.892e-02  -0.410  0.68189    
    ## NISysABP_diff     1.898e-02  8.513e-03   2.229  0.02580 *  
    ## PaCO2_diff       -2.988e-04  1.391e-02  -0.021  0.98286    
    ## PaO2_diff        -1.815e-03  1.439e-03  -1.261  0.20731    
    ## pH_diff           1.868e+00  1.760e+00   1.061  0.28853    
    ## Platelets_diff   -8.372e-04  1.255e-03  -0.667  0.50489    
    ## RespRate_diff     1.855e-02  1.542e-02   1.203  0.22890    
    ## SaO2_diff        -1.858e-02  2.566e-02  -0.724  0.46890    
    ## SysABP_diff       3.780e-03  5.935e-03   0.637  0.52421    
    ## Temp_diff         2.411e-01  1.253e-01   1.924  0.05432 .  
    ## TroponinI_diff   -1.882e-02  1.114e-02  -1.689  0.09129 .  
    ## TroponinT_diff    2.464e-02  6.472e-02   0.381  0.70346    
    ## Urine_diff       -1.119e-03  3.394e-04  -3.296  0.00098 ***
    ## WBC_diff         -5.133e-03  1.611e-02  -0.319  0.75000    
    ## Weight_diff       7.117e-03  5.451e-03   1.306  0.19165    
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    ## 
    ## (Dispersion parameter for binomial family taken to be 1)
    ## 
    ##     Null deviance: 824.13  on 900  degrees of freedom
    ## Residual deviance: 680.80  on 864  degrees of freedom
    ##   (1160 observations deleted due to missingness)
    ## AIC: 754.8
    ## 
    ## Number of Fisher Scoring iterations: 6
    # Albumin_diff, Bilirubin_diff, BUN_diff, Mg_diff, NISysABP_diff, Urine_diff were statistically significant
    
    sum(is.na(icu_patients_df1$NISysABP_diff)) # there are 453 missing values from NISysABP_diff column! removed
    ## [1] 453
    ### if you built a model that used the min/max/diff significant variables:
    
    minmaxdiffICU_glm <- glm(in_hospital_death ~ Age + Length_of_stay + SOFA + SAPS1 + ICUType + Gender +
                               #min variables that were significant
                               Bilirubin_min + BUN_min + HCT_min + Lactate_min + Temp_min + Na_min + PaO2_min + Urine_min + 
                               #max variables that were significant
                               BUN_max + Creatinine_max + GCS_max + Platelets_max + Temp_max + Urine_max +
                               #diff variables that were significant
                               Albumin_diff + Bilirubin_diff + BUN_diff + Mg_diff + Urine_diff # NISysABP_diff removed because of missing values
                               ,data=icu_patients_df1, family="binomial")
    
    summary(minmaxdiffICU_glm)
    ## 
    ## Call:
    ## glm(formula = in_hospital_death ~ Age + Length_of_stay + SOFA + 
    ##     SAPS1 + ICUType + Gender + Bilirubin_min + BUN_min + HCT_min + 
    ##     Lactate_min + Temp_min + Na_min + PaO2_min + Urine_min + 
    ##     BUN_max + Creatinine_max + GCS_max + Platelets_max + Temp_max + 
    ##     Urine_max + Albumin_diff + Bilirubin_diff + BUN_diff + Mg_diff + 
    ##     Urine_diff, family = "binomial", data = icu_patients_df1)
    ## 
    ## Deviance Residuals: 
    ##     Min       1Q   Median       3Q      Max  
    ## -2.1298  -0.5279  -0.3351  -0.1930   3.0284  
    ## 
    ## Coefficients:
    ##                                        Estimate Std. Error z value Pr(>|z|)    
    ## (Intercept)                           8.953e+00  4.453e+00   2.010 0.044380 *  
    ## Age                                   2.150e-02  5.569e-03   3.860 0.000113 ***
    ## Length_of_stay                       -5.846e-03  5.935e-03  -0.985 0.324598    
    ## SOFA                                  2.214e-02  2.558e-02   0.865 0.386846    
    ## SAPS1                                 6.715e-02  2.305e-02   2.914 0.003573 ** 
    ## ICUTypeCardiac Surgery Recovery Unit -1.119e+00  3.071e-01  -3.645 0.000268 ***
    ## ICUTypeMedical ICU                    9.494e-02  2.203e-01   0.431 0.666504    
    ## ICUTypeSurgical ICU                   6.277e-02  2.449e-01   0.256 0.797684    
    ## GenderMale                           -4.140e-02  1.530e-01  -0.271 0.786661    
    ## Bilirubin_min                         1.811e-01  6.791e-02   2.667 0.007662 ** 
    ## BUN_min                               4.560e-02  1.559e-02   2.924 0.003452 ** 
    ## HCT_min                              -4.119e-03  1.404e-02  -0.293 0.769259    
    ## Lactate_min                           1.010e-01  5.631e-02   1.794 0.072760 .  
    ## Temp_min                             -1.165e-01  8.649e-02  -1.347 0.178135    
    ## Na_min                               -4.704e-02  1.426e-02  -3.298 0.000973 ***
    ## PaO2_min                              2.825e-05  1.393e-03   0.020 0.983819    
    ## Urine_min                            -8.609e-04  1.913e-03  -0.450 0.652655    
    ## BUN_max                              -1.251e-02  1.690e-02  -0.740 0.459170    
    ## Creatinine_max                       -1.953e-01  7.145e-02  -2.734 0.006259 ** 
    ## GCS_max                              -1.538e-01  2.540e-02  -6.056 1.39e-09 ***
    ## Platelets_max                        -1.065e-03  6.930e-04  -1.536 0.124530    
    ## Temp_max                             -1.684e-02  1.058e-01  -0.159 0.873546    
    ## Urine_max                            -3.776e-03  1.873e-03  -2.016 0.043830 *  
    ## Albumin_diff                          3.366e-01  1.918e-01   1.755 0.079295 .  
    ## Bilirubin_diff                       -1.445e-01  6.881e-02  -2.100 0.035732 *  
    ## BUN_diff                             -7.692e-03  9.286e-03  -0.828 0.407450    
    ## Mg_diff                              -5.779e-02  1.920e-01  -0.301 0.763420    
    ## Urine_diff                            3.300e-03  1.943e-03   1.699 0.089393 .  
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    ## 
    ## (Dispersion parameter for binomial family taken to be 1)
    ## 
    ##     Null deviance: 1627.0  on 1964  degrees of freedom
    ## Residual deviance: 1274.6  on 1937  degrees of freedom
    ##   (96 observations deleted due to missingness)
    ## AIC: 1330.6
    ## 
    ## Number of Fisher Scoring iterations: 6
    step_minmaxdiffICU_glm <- step(minmaxdiffICU_glm, trace=1)
    ## Start:  AIC=1330.57
    ## in_hospital_death ~ Age + Length_of_stay + SOFA + SAPS1 + ICUType + 
    ##     Gender + Bilirubin_min + BUN_min + HCT_min + Lactate_min + 
    ##     Temp_min + Na_min + PaO2_min + Urine_min + BUN_max + Creatinine_max + 
    ##     GCS_max + Platelets_max + Temp_max + Urine_max + Albumin_diff + 
    ##     Bilirubin_diff + BUN_diff + Mg_diff + Urine_diff
    ## 
    ##                  Df Deviance    AIC
    ## - PaO2_min        1   1274.6 1328.6
    ## - Temp_max        1   1274.6 1328.6
    ## - Gender          1   1274.6 1328.6
    ## - HCT_min         1   1274.7 1328.7
    ## - Mg_diff         1   1274.7 1328.7
    ## - Urine_min       1   1274.8 1328.8
    ## - BUN_max         1   1275.1 1329.1
    ## - BUN_diff        1   1275.3 1329.3
    ## - SOFA            1   1275.3 1329.3
    ## - Length_of_stay  1   1275.6 1329.6
    ## - Temp_min        1   1276.4 1330.4
    ## <none>                1274.6 1330.6
    ## - Platelets_max   1   1277.0 1331.0
    ## - Urine_diff      1   1277.4 1331.4
    ## - Albumin_diff    1   1277.6 1331.6
    ## - Lactate_min     1   1277.8 1331.8
    ## - Urine_max       1   1278.6 1332.6
    ## - Bilirubin_diff  1   1279.3 1333.3
    ## - Bilirubin_min   1   1282.1 1336.1
    ## - Creatinine_max  1   1283.0 1337.0
    ## - SAPS1           1   1283.1 1337.1
    ## - BUN_min         1   1284.1 1338.1
    ## - Na_min          1   1285.1 1339.1
    ## - Age             1   1290.1 1344.1
    ## - ICUType         3   1300.6 1350.6
    ## - GCS_max         1   1311.5 1365.5
    ## 
    ## Step:  AIC=1328.57
    ## in_hospital_death ~ Age + Length_of_stay + SOFA + SAPS1 + ICUType + 
    ##     Gender + Bilirubin_min + BUN_min + HCT_min + Lactate_min + 
    ##     Temp_min + Na_min + Urine_min + BUN_max + Creatinine_max + 
    ##     GCS_max + Platelets_max + Temp_max + Urine_max + Albumin_diff + 
    ##     Bilirubin_diff + BUN_diff + Mg_diff + Urine_diff
    ## 
    ##                  Df Deviance    AIC
    ## - Temp_max        1   1274.6 1326.6
    ## - Gender          1   1274.6 1326.6
    ## - HCT_min         1   1274.7 1326.7
    ## - Mg_diff         1   1274.7 1326.7
    ## - Urine_min       1   1274.8 1326.8
    ## - BUN_max         1   1275.1 1327.1
    ## - BUN_diff        1   1275.3 1327.3
    ## - SOFA            1   1275.3 1327.3
    ## - Length_of_stay  1   1275.6 1327.6
    ## - Temp_min        1   1276.4 1328.4
    ## <none>                1274.6 1328.6
    ## - Platelets_max   1   1277.0 1329.0
    ## - Urine_diff      1   1277.5 1329.5
    ## - Albumin_diff    1   1277.6 1329.6
    ## - Lactate_min     1   1277.8 1329.8
    ## - Urine_max       1   1278.6 1330.6
    ## - Bilirubin_diff  1   1279.3 1331.3
    ## - Bilirubin_min   1   1282.1 1334.1
    ## - Creatinine_max  1   1283.0 1335.0
    ## - SAPS1           1   1283.1 1335.1
    ## - BUN_min         1   1284.1 1336.1
    ## - Na_min          1   1285.2 1337.2
    ## - Age             1   1290.1 1342.1
    ## - ICUType         3   1300.6 1348.6
    ## - GCS_max         1   1312.4 1364.4
    ## 
    ## Step:  AIC=1326.59
    ## in_hospital_death ~ Age + Length_of_stay + SOFA + SAPS1 + ICUType + 
    ##     Gender + Bilirubin_min + BUN_min + HCT_min + Lactate_min + 
    ##     Temp_min + Na_min + Urine_min + BUN_max + Creatinine_max + 
    ##     GCS_max + Platelets_max + Urine_max + Albumin_diff + Bilirubin_diff + 
    ##     BUN_diff + Mg_diff + Urine_diff
    ## 
    ##                  Df Deviance    AIC
    ## - Gender          1   1274.7 1324.7
    ## - HCT_min         1   1274.7 1324.7
    ## - Mg_diff         1   1274.7 1324.7
    ## - Urine_min       1   1274.8 1324.8
    ## - BUN_max         1   1275.1 1325.1
    ## - BUN_diff        1   1275.3 1325.3
    ## - SOFA            1   1275.4 1325.4
    ## - Length_of_stay  1   1275.6 1325.6
    ## <none>                1274.6 1326.6
    ## - Platelets_max   1   1277.0 1327.0
    ## - Temp_min        1   1277.1 1327.1
    ## - Urine_diff      1   1277.5 1327.5
    ## - Albumin_diff    1   1277.7 1327.7
    ## - Lactate_min     1   1277.8 1327.8
    ## - Urine_max       1   1278.7 1328.7
    ## - Bilirubin_diff  1   1279.3 1329.3
    ## - Bilirubin_min   1   1282.2 1332.2
    ## - Creatinine_max  1   1283.1 1333.1
    ## - SAPS1           1   1283.5 1333.5
    ## - BUN_min         1   1284.2 1334.2
    ## - Na_min          1   1285.2 1335.2
    ## - Age             1   1290.5 1340.5
    ## - ICUType         3   1300.6 1346.6
    ## - GCS_max         1   1312.7 1362.7
    ## 
    ## Step:  AIC=1324.67
    ## in_hospital_death ~ Age + Length_of_stay + SOFA + SAPS1 + ICUType + 
    ##     Bilirubin_min + BUN_min + HCT_min + Lactate_min + Temp_min + 
    ##     Na_min + Urine_min + BUN_max + Creatinine_max + GCS_max + 
    ##     Platelets_max + Urine_max + Albumin_diff + Bilirubin_diff + 
    ##     BUN_diff + Mg_diff + Urine_diff
    ## 
    ##                  Df Deviance    AIC
    ## - Mg_diff         1   1274.8 1322.8
    ## - HCT_min         1   1274.8 1322.8
    ## - Urine_min       1   1274.9 1322.9
    ## - BUN_max         1   1275.2 1323.2
    ## - BUN_diff        1   1275.3 1323.3
    ## - SOFA            1   1275.4 1323.4
    ## - Length_of_stay  1   1275.7 1323.7
    ## <none>                1274.7 1324.7
    ## - Platelets_max   1   1277.1 1325.1
    ## - Temp_min        1   1277.2 1325.2
    ## - Urine_diff      1   1277.7 1325.7
    ## - Albumin_diff    1   1277.8 1325.8
    ## - Lactate_min     1   1277.8 1325.8
    ## - Urine_max       1   1279.0 1327.0
    ## - Bilirubin_diff  1   1279.3 1327.3
    ## - Bilirubin_min   1   1282.2 1330.2
    ## - Creatinine_max  1   1283.4 1331.4
    ## - SAPS1           1   1283.7 1331.7
    ## - BUN_min         1   1284.2 1332.2
    ## - Na_min          1   1285.3 1333.3
    ## - Age             1   1290.7 1338.7
    ## - ICUType         3   1300.9 1344.9
    ## - GCS_max         1   1312.7 1360.7
    ## 
    ## Step:  AIC=1322.75
    ## in_hospital_death ~ Age + Length_of_stay + SOFA + SAPS1 + ICUType + 
    ##     Bilirubin_min + BUN_min + HCT_min + Lactate_min + Temp_min + 
    ##     Na_min + Urine_min + BUN_max + Creatinine_max + GCS_max + 
    ##     Platelets_max + Urine_max + Albumin_diff + Bilirubin_diff + 
    ##     BUN_diff + Urine_diff
    ## 
    ##                  Df Deviance    AIC
    ## - HCT_min         1   1274.8 1320.8
    ## - Urine_min       1   1275.0 1321.0
    ## - BUN_max         1   1275.4 1321.4
    ## - BUN_diff        1   1275.4 1321.4
    ## - SOFA            1   1275.5 1321.5
    ## - Length_of_stay  1   1275.8 1321.8
    ## <none>                1274.8 1322.8
    ## - Platelets_max   1   1277.1 1323.1
    ## - Temp_min        1   1277.3 1323.3
    ## - Albumin_diff    1   1277.8 1323.8
    ## - Urine_diff      1   1277.9 1323.9
    ## - Lactate_min     1   1277.9 1323.9
    ## - Urine_max       1   1279.2 1325.2
    ## - Bilirubin_diff  1   1279.4 1325.4
    ## - Bilirubin_min   1   1282.2 1328.2
    ## - Creatinine_max  1   1283.4 1329.4
    ## - SAPS1           1   1283.7 1329.7
    ## - BUN_min         1   1284.6 1330.6
    ## - Na_min          1   1285.3 1331.3
    ## - Age             1   1291.2 1337.2
    ## - ICUType         3   1301.0 1343.0
    ## - GCS_max         1   1312.8 1358.8
    ## 
    ## Step:  AIC=1320.84
    ## in_hospital_death ~ Age + Length_of_stay + SOFA + SAPS1 + ICUType + 
    ##     Bilirubin_min + BUN_min + Lactate_min + Temp_min + Na_min + 
    ##     Urine_min + BUN_max + Creatinine_max + GCS_max + Platelets_max + 
    ##     Urine_max + Albumin_diff + Bilirubin_diff + BUN_diff + Urine_diff
    ## 
    ##                  Df Deviance    AIC
    ## - Urine_min       1   1275.1 1319.1
    ## - BUN_max         1   1275.4 1319.4
    ## - BUN_diff        1   1275.5 1319.5
    ## - SOFA            1   1275.6 1319.6
    ## - Length_of_stay  1   1275.8 1319.8
    ## <none>                1274.8 1320.8
    ## - Platelets_max   1   1277.3 1321.3
    ## - Temp_min        1   1277.4 1321.4
    ## - Albumin_diff    1   1277.8 1321.8
    ## - Urine_diff      1   1277.9 1321.9
    ## - Lactate_min     1   1278.0 1322.0
    ## - Urine_max       1   1279.2 1323.2
    ## - Bilirubin_diff  1   1279.4 1323.4
    ## - Bilirubin_min   1   1282.3 1326.3
    ## - Creatinine_max  1   1283.7 1327.7
    ## - SAPS1           1   1284.5 1328.5
    ## - BUN_min         1   1284.7 1328.7
    ## - Na_min          1   1285.3 1329.3
    ## - Age             1   1291.2 1335.2
    ## - ICUType         3   1301.1 1341.1
    ## - GCS_max         1   1313.0 1357.0
    ## 
    ## Step:  AIC=1319.09
    ## in_hospital_death ~ Age + Length_of_stay + SOFA + SAPS1 + ICUType + 
    ##     Bilirubin_min + BUN_min + Lactate_min + Temp_min + Na_min + 
    ##     BUN_max + Creatinine_max + GCS_max + Platelets_max + Urine_max + 
    ##     Albumin_diff + Bilirubin_diff + BUN_diff + Urine_diff
    ## 
    ##                  Df Deviance    AIC
    ## - BUN_max         1   1275.7 1317.7
    ## - BUN_diff        1   1275.8 1317.8
    ## - SOFA            1   1276.0 1318.0
    ## - Length_of_stay  1   1276.0 1318.0
    ## <none>                1275.1 1319.1
    ## - Platelets_max   1   1277.5 1319.5
    ## - Temp_min        1   1277.7 1319.7
    ## - Albumin_diff    1   1278.0 1320.0
    ## - Lactate_min     1   1278.2 1320.2
    ## - Urine_diff      1   1278.5 1320.5
    ## - Bilirubin_diff  1   1279.6 1321.6
    ## - Urine_max       1   1279.9 1321.9
    ## - Bilirubin_min   1   1282.4 1324.4
    ## - Creatinine_max  1   1284.0 1326.0
    ## - SAPS1           1   1284.9 1326.9
    ## - BUN_min         1   1285.0 1327.0
    ## - Na_min          1   1285.5 1327.5
    ## - Age             1   1291.9 1333.9
    ## - ICUType         3   1301.1 1339.1
    ## - GCS_max         1   1313.3 1355.3
    ## 
    ## Step:  AIC=1317.69
    ## in_hospital_death ~ Age + Length_of_stay + SOFA + SAPS1 + ICUType + 
    ##     Bilirubin_min + BUN_min + Lactate_min + Temp_min + Na_min + 
    ##     Creatinine_max + GCS_max + Platelets_max + Urine_max + Albumin_diff + 
    ##     Bilirubin_diff + BUN_diff + Urine_diff
    ## 
    ##                  Df Deviance    AIC
    ## - SOFA            1   1276.4 1316.4
    ## - Length_of_stay  1   1276.7 1316.7
    ## <none>                1275.7 1317.7
    ## - BUN_diff        1   1277.9 1317.9
    ## - Platelets_max   1   1278.1 1318.1
    ## - Temp_min        1   1278.4 1318.4
    ## - Albumin_diff    1   1278.7 1318.7
    ## - Lactate_min     1   1278.7 1318.7
    ## - Urine_diff      1   1279.2 1319.2
    ## - Bilirubin_diff  1   1280.3 1320.3
    ## - Urine_max       1   1280.6 1320.6
    ## - Bilirubin_min   1   1283.1 1323.1
    ## - SAPS1           1   1285.1 1325.1
    ## - Creatinine_max  1   1285.7 1325.7
    ## - Na_min          1   1286.8 1326.8
    ## - Age             1   1292.0 1332.0
    ## - ICUType         3   1301.1 1337.1
    ## - BUN_min         1   1298.9 1338.9
    ## - GCS_max         1   1315.2 1355.2
    ## 
    ## Step:  AIC=1316.39
    ## in_hospital_death ~ Age + Length_of_stay + SAPS1 + ICUType + 
    ##     Bilirubin_min + BUN_min + Lactate_min + Temp_min + Na_min + 
    ##     Creatinine_max + GCS_max + Platelets_max + Urine_max + Albumin_diff + 
    ##     Bilirubin_diff + BUN_diff + Urine_diff
    ## 
    ##                  Df Deviance    AIC
    ## - Length_of_stay  1   1277.3 1315.3
    ## <none>                1276.4 1316.4
    ## - BUN_diff        1   1278.6 1316.6
    ## - Temp_min        1   1279.1 1317.1
    ## - Platelets_max   1   1279.4 1317.4
    ## - Lactate_min     1   1279.4 1317.4
    ## - Albumin_diff    1   1279.5 1317.5
    ## - Urine_diff      1   1280.1 1318.1
    ## - Bilirubin_diff  1   1281.2 1319.2
    ## - Urine_max       1   1281.5 1319.5
    ## - Bilirubin_min   1   1284.3 1322.3
    ## - Creatinine_max  1   1286.0 1324.0
    ## - Na_min          1   1287.5 1325.5
    ## - Age             1   1292.0 1330.0
    ## - SAPS1           1   1292.3 1330.3
    ## - ICUType         3   1301.1 1335.1
    ## - BUN_min         1   1300.6 1338.6
    ## - GCS_max         1   1320.7 1358.7
    ## 
    ## Step:  AIC=1315.29
    ## in_hospital_death ~ Age + SAPS1 + ICUType + Bilirubin_min + BUN_min + 
    ##     Lactate_min + Temp_min + Na_min + Creatinine_max + GCS_max + 
    ##     Platelets_max + Urine_max + Albumin_diff + Bilirubin_diff + 
    ##     BUN_diff + Urine_diff
    ## 
    ##                  Df Deviance    AIC
    ## <none>                1277.3 1315.3
    ## - BUN_diff        1   1279.5 1315.5
    ## - Temp_min        1   1280.2 1316.2
    ## - Platelets_max   1   1280.3 1316.3
    ## - Lactate_min     1   1280.4 1316.4
    ## - Albumin_diff    1   1280.5 1316.5
    ## - Urine_diff      1   1280.7 1316.7
    ## - Bilirubin_diff  1   1282.1 1318.1
    ## - Urine_max       1   1282.2 1318.2
    ## - Bilirubin_min   1   1285.2 1321.2
    ## - Creatinine_max  1   1286.7 1322.7
    ## - Na_min          1   1288.2 1324.2
    ## - SAPS1           1   1292.7 1328.7
    ## - Age             1   1294.1 1330.1
    ## - ICUType         3   1302.1 1334.1
    ## - BUN_min         1   1301.1 1337.1
    ## - GCS_max         1   1321.1 1357.1
    summary(step_minmaxdiffICU_glm)
    ## 
    ## Call:
    ## glm(formula = in_hospital_death ~ Age + SAPS1 + ICUType + Bilirubin_min + 
    ##     BUN_min + Lactate_min + Temp_min + Na_min + Creatinine_max + 
    ##     GCS_max + Platelets_max + Urine_max + Albumin_diff + Bilirubin_diff + 
    ##     BUN_diff + Urine_diff, family = "binomial", data = icu_patients_df1)
    ## 
    ## Deviance Residuals: 
    ##     Min       1Q   Median       3Q      Max  
    ## -2.1418  -0.5343  -0.3366  -0.1915   2.9968  
    ## 
    ## Coefficients:
    ##                                        Estimate Std. Error z value Pr(>|z|)    
    ## (Intercept)                           8.7050888  3.5243615   2.470 0.013512 *  
    ## Age                                   0.0212706  0.0052960   4.016 5.91e-05 ***
    ## SAPS1                                 0.0727961  0.0185586   3.923 8.76e-05 ***
    ## ICUTypeCardiac Surgery Recovery Unit -1.0525238  0.2963255  -3.552 0.000382 ***
    ## ICUTypeMedical ICU                    0.0893952  0.2183601   0.409 0.682251    
    ## ICUTypeSurgical ICU                   0.0445516  0.2401282   0.186 0.852811    
    ## Bilirubin_min                         0.1833928  0.0673292   2.724 0.006453 ** 
    ## BUN_min                               0.0361351  0.0076288   4.737 2.17e-06 ***
    ## Lactate_min                           0.0978048  0.0552595   1.770 0.076741 .  
    ## Temp_min                             -0.1315688  0.0774720  -1.698 0.089456 .  
    ## Na_min                               -0.0470812  0.0140568  -3.349 0.000810 ***
    ## Creatinine_max                       -0.1997437  0.0695562  -2.872 0.004083 ** 
    ## GCS_max                              -0.1578605  0.0238632  -6.615 3.71e-11 ***
    ## Platelets_max                        -0.0011412  0.0006751  -1.690 0.090977 .  
    ## Urine_max                            -0.0039957  0.0018065  -2.212 0.026982 *  
    ## Albumin_diff                          0.3415711  0.1902130   1.796 0.072538 .  
    ## Bilirubin_diff                       -0.1448016  0.0684550  -2.115 0.034406 *  
    ## BUN_diff                             -0.0117014  0.0078467  -1.491 0.135894    
    ## Urine_diff                            0.0035104  0.0018834   1.864 0.062339 .  
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    ## 
    ## (Dispersion parameter for binomial family taken to be 1)
    ## 
    ##     Null deviance: 1627.0  on 1964  degrees of freedom
    ## Residual deviance: 1277.3  on 1946  degrees of freedom
    ##   (96 observations deleted due to missingness)
    ## AIC: 1315.3
    ## 
    ## Number of Fisher Scoring iterations: 6
    # predictors left behind after step() are:
    # Age, SAPS1, ICUType
    # Albumin_diff
    # Bilirubin_min, Bilirubin_diff
    # BUN_min, BUN_diff
    # Creatinine_max
    # GCS_max
    # Lactate_min
    # Na_min
    # Platelets_max
    # Temp_min
    # Urine_max, Urine_diff
    1. Present your final model. Your final model should not include all the predictor variables, just a small subset of them, which you have selected based on statistical significance and/or background knowledge.
    finalICU_glm <- glm(in_hospital_death ~ 
                          # significant predictors from step()
                          Age + SAPS1 + ICUType + Albumin_diff + Bilirubin_min + Bilirubin_diff + BUN_min + BUN_diff + Creatinine_max + GCS_max + Lactate_min + Na_min + Platelets_max + Temp_min + Urine_max + Urine_diff +
                          
                          # predictors that are clinically relevant but not included in above
                          
                          # baseline demographics should be included even if not significant
                          Gender + Length_of_stay + Weight_min + 
                          SOFA + # an indicator of how well SOFA score determines mortality independent to its components
                          
                          # other clinical relevance
                          Albumin_min + # low albumin indicates malnutrition or liver failure
                          Glucose_max + # hyperglycaemia is a stress response
                          HCT_min + # low HCT = anaemia
                          HR_max + # tachycardia may indicate septic shock / inflammation
                          PaO2_min + # hypoxia = inadequate organ perfusion/oxygenation
                          PaCO2_min + #hypercapnia = respiratory / ventilation failure
                          pH_min + # indicates acidaemia / inadequate organ perfusion
                          RespRate_max + # indicates respiratory failure
                          TroponinT_max + # indicates myocardial damage
                          WBC_max # indicates infection
                          
                        ,data=icu_patients_df1, family="binomial")
    summary(finalICU_glm)
    ## 
    ## Call:
    ## glm(formula = in_hospital_death ~ Age + SAPS1 + ICUType + Albumin_diff + 
    ##     Bilirubin_min + Bilirubin_diff + BUN_min + BUN_diff + Creatinine_max + 
    ##     GCS_max + Lactate_min + Na_min + Platelets_max + Temp_min + 
    ##     Urine_max + Urine_diff + Gender + Length_of_stay + Weight_min + 
    ##     SOFA + Albumin_min + Glucose_max + HCT_min + HR_max + PaO2_min + 
    ##     PaCO2_min + pH_min + RespRate_max + TroponinT_max + WBC_max, 
    ##     family = "binomial", data = icu_patients_df1)
    ## 
    ## Deviance Residuals: 
    ##     Min       1Q   Median       3Q      Max  
    ## -2.1864  -0.5256  -0.3190  -0.1726   3.0839  
    ## 
    ## Coefficients:
    ##                                        Estimate Std. Error z value Pr(>|z|)    
    ## (Intercept)                          11.3197869  6.3092397   1.794  0.07279 .  
    ## Age                                   0.0261029  0.0059357   4.398 1.09e-05 ***
    ## SAPS1                                 0.0535276  0.0242680   2.206  0.02741 *  
    ## ICUTypeCardiac Surgery Recovery Unit -0.9149479  0.3272024  -2.796  0.00517 ** 
    ## ICUTypeMedical ICU                    0.1330375  0.2428895   0.548  0.58388    
    ## ICUTypeSurgical ICU                   0.3070023  0.2676527   1.147  0.25137    
    ## Albumin_diff                          0.2474274  0.2018893   1.226  0.22036    
    ## Bilirubin_min                         0.2069290  0.0698005   2.965  0.00303 ** 
    ## Bilirubin_diff                       -0.1706431  0.0707418  -2.412  0.01586 *  
    ## BUN_min                               0.0407697  0.0081139   5.025 5.04e-07 ***
    ## BUN_diff                             -0.0151092  0.0082071  -1.841  0.06562 .  
    ## Creatinine_max                       -0.1787734  0.0720629  -2.481  0.01311 *  
    ## GCS_max                              -0.1497749  0.0270334  -5.540 3.02e-08 ***
    ## Lactate_min                           0.0564987  0.0599373   0.943  0.34587    
    ## Na_min                               -0.0494656  0.0152171  -3.251  0.00115 ** 
    ## Platelets_max                        -0.0011105  0.0007713  -1.440  0.14992    
    ## Temp_min                             -0.1117099  0.0828565  -1.348  0.17758    
    ## Urine_max                            -0.0037421  0.0019103  -1.959  0.05012 .  
    ## Urine_diff                            0.0032488  0.0019892   1.633  0.10243    
    ## GenderMale                           -0.0517044  0.1608977  -0.321  0.74795    
    ## Length_of_stay                       -0.0076162  0.0061227  -1.244  0.21353    
    ## Weight_min                           -0.0049457  0.0039722  -1.245  0.21310    
    ## SOFA                                  0.0197980  0.0271001   0.731  0.46505    
    ## Albumin_min                          -0.1680877  0.1292172  -1.301  0.19332    
    ## Glucose_max                           0.0002854  0.0007949   0.359  0.71958    
    ## HCT_min                              -0.0082671  0.0150655  -0.549  0.58318    
    ## HR_max                                0.0077495  0.0034199   2.266  0.02345 *  
    ## PaO2_min                              0.0006981  0.0014372   0.486  0.62715    
    ## PaCO2_min                             0.0078284  0.0096972   0.807  0.41950    
    ## pH_min                               -0.4944507  0.7266460  -0.680  0.49622    
    ## RespRate_max                          0.0114649  0.0103635   1.106  0.26861    
    ## TroponinT_max                         0.0534140  0.0343041   1.557  0.11945    
    ## WBC_max                              -0.0090501  0.0105882  -0.855  0.39270    
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    ## 
    ## (Dispersion parameter for binomial family taken to be 1)
    ## 
    ##     Null deviance: 1549.9  on 1854  degrees of freedom
    ## Residual deviance: 1182.0  on 1822  degrees of freedom
    ##   (206 observations deleted due to missingness)
    ## AIC: 1248
    ## 
    ## Number of Fisher Scoring iterations: 6
    # test using modified poisson regression for more common outcomes on the same covariates as above
    
    finalICU_glm_poisson <- glm(in_hospital_death ~ 
                          # significant predictors from step()
                          Age + SAPS1 + ICUType + Albumin_diff + Bilirubin_min + 
                          Bilirubin_diff + BUN_min + BUN_diff + Creatinine_max + 
                          GCS_max + Lactate_min + Na_min + Platelets_max + 
                          Temp_min + Urine_max + Urine_diff +
                            
                          # baseline demographics should be included even if not significant
                          Gender + Length_of_stay + Weight_min + 
                          SOFA + # an indicator of how well SOFA score determines mortality independent to its components
                          
                          # other clinical relevance
                          Albumin_min + # low albumin indicates malnutrition or liver failure
                          Glucose_max + # hyperglycaemia is a stress response
                          HCT_min + # low HCT = anaemia
                          HR_max + # tachycardia may indicate septic shock / inflammation
                          PaO2_min + # hypoxia = inadequate organ perfusion/oxygenation
                          PaCO2_min + #hypercapnia = respiratory / ventilation failure
                          pH_min + # indicates acidaemia / inadequate organ perfusion
                          RespRate_max + # indicates respiratory failure
                          TroponinT_max + # indicates myocardial damage
                          WBC_max # indicates infection
                          
                        , data=icu_patients_df1, family="poisson"(link="log"))
    
    summary(finalICU_glm_poisson)
    ## 
    ## Call:
    ## glm(formula = in_hospital_death ~ Age + SAPS1 + ICUType + Albumin_diff + 
    ##     Bilirubin_min + Bilirubin_diff + BUN_min + BUN_diff + Creatinine_max + 
    ##     GCS_max + Lactate_min + Na_min + Platelets_max + Temp_min + 
    ##     Urine_max + Urine_diff + Gender + Length_of_stay + Weight_min + 
    ##     SOFA + Albumin_min + Glucose_max + HCT_min + HR_max + PaO2_min + 
    ##     PaCO2_min + pH_min + RespRate_max + TroponinT_max + WBC_max, 
    ##     family = poisson(link = "log"), data = icu_patients_df1)
    ## 
    ## Deviance Residuals: 
    ##     Min       1Q   Median       3Q      Max  
    ## -1.8744  -0.5069  -0.3491  -0.2183   2.4834  
    ## 
    ## Coefficients:
    ##                                        Estimate Std. Error z value Pr(>|z|)    
    ## (Intercept)                           5.0214843  3.3904946   1.481  0.13859    
    ## Age                                   0.0198792  0.0049400   4.024 5.72e-05 ***
    ## SAPS1                                 0.0339633  0.0204016   1.665  0.09596 .  
    ## ICUTypeCardiac Surgery Recovery Unit -0.7603505  0.2813563  -2.702  0.00688 ** 
    ## ICUTypeMedical ICU                    0.0629493  0.1935242   0.325  0.74497    
    ## ICUTypeSurgical ICU                   0.1931383  0.2185759   0.884  0.37690    
    ## Albumin_diff                          0.1457395  0.1641374   0.888  0.37459    
    ## Bilirubin_min                         0.1464095  0.0568473   2.575  0.01001 *  
    ## Bilirubin_diff                       -0.1263382  0.0579171  -2.181  0.02916 *  
    ## BUN_min                               0.0292204  0.0068460   4.268 1.97e-05 ***
    ## BUN_diff                             -0.0157550  0.0068983  -2.284  0.02238 *  
    ## Creatinine_max                       -0.0890083  0.0548536  -1.623  0.10466    
    ## GCS_max                              -0.0990101  0.0213334  -4.641 3.47e-06 ***
    ## Lactate_min                          -0.0110473  0.0333359  -0.331  0.74035    
    ## Na_min                               -0.0352725  0.0121226  -2.910  0.00362 ** 
    ## Platelets_max                        -0.0006707  0.0006484  -1.034  0.30096    
    ## Temp_min                             -0.0647282  0.0531798  -1.217  0.22354    
    ## Urine_max                            -0.0015792  0.0014410  -1.096  0.27312    
    ## Urine_diff                            0.0011907  0.0015125   0.787  0.43113    
    ## GenderMale                           -0.0834178  0.1320252  -0.632  0.52750    
    ## Length_of_stay                       -0.0042229  0.0048572  -0.869  0.38462    
    ## Weight_min                           -0.0037488  0.0033072  -1.134  0.25700    
    ## SOFA                                  0.0144047  0.0215427   0.669  0.50371    
    ## Albumin_min                          -0.1336429  0.1048927  -1.274  0.20263    
    ## Glucose_max                           0.0001538  0.0006267   0.245  0.80608    
    ## HCT_min                              -0.0108434  0.0126736  -0.856  0.39222    
    ## HR_max                                0.0052571  0.0026488   1.985  0.04718 *  
    ## PaO2_min                              0.0004131  0.0011102   0.372  0.70981    
    ## PaCO2_min                             0.0059266  0.0079216   0.748  0.45437    
    ## pH_min                               -0.0847659  0.2631944  -0.322  0.74740    
    ## RespRate_max                          0.0095491  0.0085073   1.122  0.26167    
    ## TroponinT_max                         0.0334174  0.0247510   1.350  0.17697    
    ## WBC_max                              -0.0071252  0.0086893  -0.820  0.41222    
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    ## 
    ## (Dispersion parameter for poisson family taken to be 1)
    ## 
    ##     Null deviance: 1046.23  on 1854  degrees of freedom
    ## Residual deviance:  769.47  on 1822  degrees of freedom
    ##   (206 observations deleted due to missingness)
    ## AIC: 1381.5
    ## 
    ## Number of Fisher Scoring iterations: 6
    # fewer significant variables (likely as CI can be wider in poisson)
    # but the variables that are significant were also significant in the logistic model
    
    # examine ORs from logistic regression
    options(scipen=999) # turn off scientific notation
    exp(coef(finalICU_glm))
    ##                          (Intercept)                                  Age 
    ##                        82436.7716848                            1.0264465 
    ##                                SAPS1 ICUTypeCardiac Surgery Recovery Unit 
    ##                            1.0549861                            0.4005375 
    ##                   ICUTypeMedical ICU                  ICUTypeSurgical ICU 
    ##                            1.1422928                            1.3593441 
    ##                         Albumin_diff                        Bilirubin_min 
    ##                            1.2807264                            1.2298952 
    ##                       Bilirubin_diff                              BUN_min 
    ##                            0.8431225                            1.0416122 
    ##                             BUN_diff                       Creatinine_max 
    ##                            0.9850044                            0.8362954 
    ##                              GCS_max                          Lactate_min 
    ##                            0.8609018                            1.0581252 
    ##                               Na_min                        Platelets_max 
    ##                            0.9517379                            0.9988901 
    ##                             Temp_min                            Urine_max 
    ##                            0.8943036                            0.9962649 
    ##                           Urine_diff                           GenderMale 
    ##                            1.0032541                            0.9496095 
    ##                       Length_of_stay                           Weight_min 
    ##                            0.9924128                            0.9950665 
    ##                                 SOFA                          Albumin_min 
    ##                            1.0199953                            0.8452797 
    ##                          Glucose_max                              HCT_min 
    ##                            1.0002854                            0.9917670 
    ##                               HR_max                             PaO2_min 
    ##                            1.0077796                            1.0006984 
    ##                            PaCO2_min                               pH_min 
    ##                            1.0078591                            0.6099058 
    ##                         RespRate_max                        TroponinT_max 
    ##                            1.0115309                            1.0548663 
    ##                              WBC_max 
    ##                            0.9909907
    # examine RRs from logistic regression
    exp(coef(finalICU_glm_poisson))
    ##                          (Intercept)                                  Age 
    ##                          151.6362129                            1.0200782 
    ##                                SAPS1 ICUTypeCardiac Surgery Recovery Unit 
    ##                            1.0345467                            0.4675026 
    ##                   ICUTypeMedical ICU                  ICUTypeSurgical ICU 
    ##                            1.0649728                            1.2130505 
    ##                         Albumin_diff                        Bilirubin_min 
    ##                            1.1568947                            1.1576701 
    ##                       Bilirubin_diff                              BUN_min 
    ##                            0.8813167                            1.0296515 
    ##                             BUN_diff                       Creatinine_max 
    ##                            0.9843684                            0.9148380 
    ##                              GCS_max                          Lactate_min 
    ##                            0.9057335                            0.9890134 
    ##                               Na_min                        Platelets_max 
    ##                            0.9653423                            0.9993295 
    ##                             Temp_min                            Urine_max 
    ##                            0.9373222                            0.9984220 
    ##                           Urine_diff                           GenderMale 
    ##                            1.0011914                            0.9199667 
    ##                       Length_of_stay                           Weight_min 
    ##                            0.9957860                            0.9962582 
    ##                                 SOFA                          Albumin_min 
    ##                            1.0145089                            0.8749024 
    ##                          Glucose_max                              HCT_min 
    ##                            1.0001539                            0.9892151 
    ##                               HR_max                             PaO2_min 
    ##                            1.0052709                            1.0004132 
    ##                            PaCO2_min                               pH_min 
    ##                            1.0059442                            0.9187274 
    ##                         RespRate_max                        TroponinT_max 
    ##                            1.0095948                            1.0339820 
    ##                              WBC_max 
    ##                            0.9929001
    # the ORs and RRs appear very similar --> check the actual differences
    exp(coef(finalICU_glm))-exp(coef(finalICU_glm_poisson))
    ##                          (Intercept)                                  Age 
    ##                     82285.1354719059                         0.0063683588 
    ##                                SAPS1 ICUTypeCardiac Surgery Recovery Unit 
    ##                         0.0204394406                        -0.0669650446 
    ##                   ICUTypeMedical ICU                  ICUTypeSurgical ICU 
    ##                         0.0773199959                         0.1462936040 
    ##                         Albumin_diff                        Bilirubin_min 
    ##                         0.1238316698                         0.0722251056 
    ##                       Bilirubin_diff                              BUN_min 
    ##                        -0.0381942636                         0.0119607014 
    ##                             BUN_diff                       Creatinine_max 
    ##                         0.0006359748                        -0.0785426101 
    ##                              GCS_max                          Lactate_min 
    ##                        -0.0448317494                         0.0691117953 
    ##                               Na_min                        Platelets_max 
    ##                        -0.0136044129                        -0.0004393842 
    ##                             Temp_min                            Urine_max 
    ##                        -0.0430185399                        -0.0021571705 
    ##                           Urine_diff                           GenderMale 
    ##                         0.0020626400                         0.0296427916 
    ##                       Length_of_stay                           Weight_min 
    ##                        -0.0033731879                        -0.0011917417 
    ##                                 SOFA                          Albumin_min 
    ##                         0.0054863813                        -0.0296227050 
    ##                          Glucose_max                              HCT_min 
    ##                         0.0001315695                         0.0025518576 
    ##                               HR_max                             PaO2_min 
    ##                         0.0025087077                         0.0002851903 
    ##                            PaCO2_min                               pH_min 
    ##                         0.0019148808                        -0.3088215213 
    ##                         RespRate_max                        TroponinT_max 
    ##                         0.0019360867                         0.0208842520 
    ##                              WBC_max 
    ##                        -0.0019094248
    # the intercept is very different (by 82000!) - not sure how to interpret that. the other estimates are very similar
    
    # perhaps the logistic model is therefore justified? just need to be careful in interpretation using 'odds' rather than 'risk'
    1. For your final model, present a set of diagnostic statistics and/or charts and comment on them.
    library(magrittr)
    library(dplyr)
    ## 
    ## Attaching package: 'dplyr'
    ## The following objects are masked from 'package:stats':
    ## 
    ##     filter, lag
    ## The following objects are masked from 'package:base':
    ## 
    ##     intersect, setdiff, setequal, union
    # lots of missing data in:
    # Survival, ABP, NIBP variables
    # some missing data in SAPS1 and Weight - correlates with GLM full model missing data
    for(i in 1:length(colnames(icu_patients_df1))){
      print(c(i,colnames(icu_patients_df1[i]), sum(is.na(icu_patients_df1[i]))))
    }
    ## [1] "1"        "RecordID" "0"       
    ## [1] "2"              "Length_of_stay" "0"             
    ## [1] "3"     "SAPS1" "96"   
    ## [1] "4"    "SOFA" "0"   
    ## [1] "5"        "Survival" "1288"    
    ## [1] "6"                 "in_hospital_death" "0"                
    ## [1] "7"    "Days" "0"   
    ## [1] "8"      "Status" "0"     
    ## [1] "9"   "Age" "0"  
    ## [1] "10"           "Albumin_diff" "0"           
    ## [1] "11"          "Albumin_max" "0"          
    ## [1] "12"          "Albumin_min" "0"          
    ## [1] "13"       "ALP_diff" "0"       
    ## [1] "14"      "ALP_max" "0"      
    ## [1] "15"      "ALP_min" "0"      
    ## [1] "16"       "ALT_diff" "0"       
    ## [1] "17"      "ALT_max" "0"      
    ## [1] "18"      "ALT_min" "0"      
    ## [1] "19"       "AST_diff" "0"       
    ## [1] "20"      "AST_max" "0"      
    ## [1] "21"      "AST_min" "0"      
    ## [1] "22"             "Bilirubin_diff" "0"             
    ## [1] "23"            "Bilirubin_max" "0"            
    ## [1] "24"            "Bilirubin_min" "0"            
    ## [1] "25"       "BUN_diff" "0"       
    ## [1] "26"      "BUN_max" "0"      
    ## [1] "27"      "BUN_min" "0"      
    ## [1] "28"               "Cholesterol_diff" "0"               
    ## [1] "29"              "Cholesterol_max" "0"              
    ## [1] "30"              "Cholesterol_min" "0"              
    ## [1] "31"              "Creatinine_diff" "0"              
    ## [1] "32"             "Creatinine_max" "0"             
    ## [1] "33"             "Creatinine_min" "0"             
    ## [1] "34"           "DiasABP_diff" "715"         
    ## [1] "35"          "DiasABP_max" "715"        
    ## [1] "36"          "DiasABP_min" "715"        
    ## [1] "37"        "FiO2_diff" "0"        
    ## [1] "38"       "FiO2_max" "0"       
    ## [1] "39"       "FiO2_min" "0"       
    ## [1] "40"       "GCS_diff" "0"       
    ## [1] "41"      "GCS_max" "0"      
    ## [1] "42"      "GCS_min" "0"      
    ## [1] "43"     "Gender" "0"     
    ## [1] "44"           "Glucose_diff" "0"           
    ## [1] "45"          "Glucose_max" "0"          
    ## [1] "46"          "Glucose_min" "0"          
    ## [1] "47"        "HCO3_diff" "0"        
    ## [1] "48"       "HCO3_max" "0"       
    ## [1] "49"       "HCO3_min" "0"       
    ## [1] "50"       "HCT_diff" "0"       
    ## [1] "51"      "HCT_max" "0"      
    ## [1] "52"      "HCT_min" "0"      
    ## [1] "53"     "Height" "992"   
    ## [1] "54"      "HR_diff" "0"      
    ## [1] "55"     "HR_max" "0"     
    ## [1] "56"     "HR_min" "0"     
    ## [1] "57"      "ICUType" "0"      
    ## [1] "58"     "K_diff" "0"     
    ## [1] "59"    "K_max" "0"    
    ## [1] "60"    "K_min" "0"    
    ## [1] "61"           "Lactate_diff" "0"           
    ## [1] "62"          "Lactate_max" "0"          
    ## [1] "63"          "Lactate_min" "0"          
    ## [1] "64"       "MAP_diff" "0"       
    ## [1] "65"      "MAP_max" "0"      
    ## [1] "66"      "MAP_min" "0"      
    ## [1] "67"      "Mg_diff" "0"      
    ## [1] "68"     "Mg_max" "0"     
    ## [1] "69"     "Mg_min" "0"     
    ## [1] "70"      "Na_diff" "0"      
    ## [1] "71"     "Na_max" "0"     
    ## [1] "72"     "Na_min" "0"     
    ## [1] "73"             "NIDiasABP_diff" "455"           
    ## [1] "74"            "NIDiasABP_max" "455"          
    ## [1] "75"            "NIDiasABP_min" "455"          
    ## [1] "76"         "NIMAP_diff" "455"       
    ## [1] "77"        "NIMAP_max" "455"      
    ## [1] "78"        "NIMAP_min" "455"      
    ## [1] "79"            "NISysABP_diff" "453"          
    ## [1] "80"           "NISysABP_max" "453"         
    ## [1] "81"           "NISysABP_min" "453"         
    ## [1] "82"         "PaCO2_diff" "0"         
    ## [1] "83"        "PaCO2_max" "0"        
    ## [1] "84"        "PaCO2_min" "0"        
    ## [1] "85"        "PaO2_diff" "0"        
    ## [1] "86"       "PaO2_max" "0"       
    ## [1] "87"       "PaO2_min" "0"       
    ## [1] "88"      "pH_diff" "0"      
    ## [1] "89"     "pH_max" "0"     
    ## [1] "90"     "pH_min" "0"     
    ## [1] "91"             "Platelets_diff" "0"             
    ## [1] "92"            "Platelets_max" "0"            
    ## [1] "93"            "Platelets_min" "0"            
    ## [1] "94"            "RespRate_diff" "0"            
    ## [1] "95"           "RespRate_max" "0"           
    ## [1] "96"           "RespRate_min" "0"           
    ## [1] "97"        "SaO2_diff" "0"        
    ## [1] "98"       "SaO2_max" "0"       
    ## [1] "99"       "SaO2_min" "0"       
    ## [1] "100"         "SysABP_diff" "715"        
    ## [1] "101"        "SysABP_max" "715"       
    ## [1] "102"        "SysABP_min" "715"       
    ## [1] "103"       "Temp_diff" "0"        
    ## [1] "104"      "Temp_max" "0"       
    ## [1] "105"      "Temp_min" "0"       
    ## [1] "106"            "TroponinI_diff" "0"             
    ## [1] "107"           "TroponinI_max" "0"            
    ## [1] "108"           "TroponinI_min" "0"            
    ## [1] "109"            "TroponinT_diff" "0"             
    ## [1] "110"           "TroponinT_max" "0"            
    ## [1] "111"           "TroponinT_min" "0"            
    ## [1] "112"        "Urine_diff" "0"         
    ## [1] "113"       "Urine_max" "0"        
    ## [1] "114"       "Urine_min" "0"        
    ## [1] "115"      "WBC_diff" "0"       
    ## [1] "116"     "WBC_max" "0"      
    ## [1] "117"     "WBC_min" "0"      
    ## [1] "118"         "Weight_diff" "146"        
    ## [1] "119"        "Weight_max" "146"       
    ## [1] "120"        "Weight_min" "146"
    # remove observations with missing values from the data frame, 
    # because they are automatically dropped by glm()
    
    # remove the survival, ABP, some of the BP, height columns first
    icu_patients_df1_nm <- icu_patients_df1[, -c(5,34:36,53,73:81, 100:102)]
    icu_patients_df1_nm <- na.omit(icu_patients_df1_nm)
    
    
    ### Goodness of fit using bins ###
    
    # add predicted probabilities to the data frame
    icu_patients_df1_nm %>% mutate(predprob=predict(finalICU_glm, type="response"),
                       linpred=predict(finalICU_glm)) %>%
    # group the data into bins based on the linear predictor fitted values
    group_by(cut(linpred, breaks=unique(quantile(linpred, (1:50)/51)))) %>%
    # summarise by bin
    summarise(death_bin=sum(in_hospital_death), predprob_bin=mean(predprob), n_bin=n()) %>%
    # add the standard error of the mean predicted probaility for each bin
    mutate(se_predprob_bin=sqrt(predprob_bin*(1 - predprob_bin)/n_bin)) %>%
    # plot it with 95% confidence interval bars
    ggplot(aes(x=predprob_bin, 
               y=death_bin/n_bin, 
               ymin=death_bin/n_bin - 1.96*se_predprob_bin,
               ymax=death_bin/n_bin + 1.96*se_predprob_bin)) +
      geom_point() + geom_linerange(colour="orange", alpha=0.4) +
      geom_abline(intercept=0, slope=1) + 
      labs(x="Predicted probability (binned)",
           y="Observed proportion (in each bin)")

    # the ideal calibration line fits within most of the dots and their 95% CI
    
    ### Goodness of fit using Hosmer Lemeshow stat ###
    
    icu_patients_df1_nm %>% mutate(predprob=predict(finalICU_glm, type="response"),
                       linpred=predict(finalICU_glm)) %>%
    group_by(cut(linpred, breaks=unique(quantile(linpred, (1:50)/51)))) %>%
    summarise(death_bin=sum(in_hospital_death), predprob_bin=mean(predprob), n_bin=n()) %>%
    mutate(se_predprob_bin=sqrt(predprob_bin*(1 - predprob_bin)/n_bin)) -> hl_df
    
    hl_stat <- with(hl_df, sum( (death_bin - n_bin*predprob_bin)^2 /
                                (n_bin* predprob_bin*(1 - predprob_bin))))
    hl <- c(hosmer_lemeshow_stat=hl_stat, hl_degrees_freedom=nrow(hl_df) - 1)
    hl
    ## hosmer_lemeshow_stat   hl_degrees_freedom 
    ##             48.22866             49.00000
    # calculate p-value
    c(p_val=1 - pchisq(hl[1], hl[2])) # the p value here is not statistically significant, indicating no lack of fit
    ## p_val.hosmer_lemeshow_stat 
    ##                  0.5043216
    ### Brier score ###
    
    get_brier <- function(model){
      predprob <- predict(model, type="response")
      Brier_score <- mean((predprob - icu_patients_df1_nm$in_hospital_death)^2)
      return(Brier_score)
    }
    
    get_brier(finalICU_glm)
    ## [1] 0.09699079
    get_brier(minmaxdiffICU_glm)
    ## Warning in predprob - icu_patients_df1_nm$in_hospital_death: longer object
    ## length is not a multiple of shorter object length
    ## [1] 0.1540638
    get_brier(step_minmaxdiffICU_glm)
    ## Warning in predprob - icu_patients_df1_nm$in_hospital_death: longer object
    ## length is not a multiple of shorter object length
    ## [1] 0.1538637
    # the final model has the lowest brier score -> lower score is better fit
    1. Write a paragraph summarising the most important findings of your final model. Include the most important values from the statistical output, and a simple clinical interpretation.

    Create your response to this task here, as a mixture of embedded (knitr) R code and any resulting outputs, and explanatory or commentary text.

    Task 2 (15 marks)

    In this task, you are required to develop a Cox proportional hazards survival model using the icu_patients_df1 data set which adequately explains or predicts the length of survival indicated by the Days variable, with censoring as indicated by the Status variable. You should fit a series of models, maybe three or four, evaluating each one, before you present your final model. Your final model should not include all the predictor variables, just a small subset of them, which you have selected based on statistical significance and/or background knowledge. Aim for between five and ten predictor variables (slightly more or fewer is OK). It is perfectly acceptable to include predictor variables in your final model which are not statistically significant, as long as you justify their inclusion on medical or physiological grounds (you will not be marked down if your medical justification is not exactly correct, but do you best). You should assess each model you consider for goodness of fit and other relevant statistics, and you should assess your final model for violations of assumptions and perform other diagnostics which you think are relevant (and modify the model if indicated, or at least comment on the possible impact of what your diagnostics show). Finally, re-fit your final model to the unimputed data frame (icu_patients_df0.rds) and comment on any differences you find.

    Hints

    1. Select an initial subset of explanatory variables that you will use to predict survival. Justify your choice.

    2. Conduct basic exploratory data analysis on your variables of choice.

    3. Fit appropriate univariate Cox proportional hazards models.

    4. Fit an appropriate series of multivariable Cox proportional hazards models, justifying your approach. Assess each model you consider for goodness of fit and other relevant statistics.

    5. Present your final model. Your final model should not include all the predictor variables, just a small subset of them, which you have selected based on statistical significance and/or background knowledge.

    6. For your final model, present a set of diagnostic statistics and/or charts and comment on them.

    7. Write a very brief paragraph summarising the most important findings of your final model. Include the most important values from the statistical output, and a simple clinical interpretation.

    Create your response to this task here, as a mixture of embedded (knitr) R code and any resulting outputs, and explanatory or commentary text.

    Save, knit and submit

    Reminder: don’t forget to save this file, to knit it to check that everything works, and then submit via the drop box in OpenLearning.

    Submit your assignment

    When you have finished, and are satisfied with your assignment solutions, and this file knits without errors and the output looks the way you want, then you should submit via the drop box in OpenLearning.

    Problems?

    If you encounter problems with any part of the process described above, please contact the course convenor via OpenLearning as soon as possible so that the issues can be resolved in good time, and well before the assignment is due.

    Additional Information

    Each task attracts the indicated number of marks (out of a total of 30 marks for the assignment). The instructions are deliberately open-ended and less prescriptive than the individual assignments to allow you some latitude in what you do and how you go about the task. However, to complete the tasks and gain full marks, you only need to replicate or repeat the steps covered in the course - if you do most or all of the things described in the revalant chapters of the HDAT9600 course, full marks will be awarded.

    Note also that with respect to the model fitting, there are no right or wrong answers when it comes to variable selection and other aspects of model specification. Deep understanding of the underlying medical concepts which govern patient treatment and outcomes in ICUs is not required or assumed, although you should try to gain some understanding of each variable using the links provided. You will not be marked down if your medical justifications are not exactly correct or complete, but do you best, and don’t hesitate to seek help from the course convenor.